Kalman Filter

Implementation of Kalman filters using pytorch and parameter optimizations with gradient descend

Introduction

The models uses a latent state variable \(x\) that is modelled over time, to impute gaps in \(y\)

The assumption of the model is that the state variable at time \(x_t\) depends only on the last state \(x_{t-1}\) and not on the previous states.

Equations

The equations of the model are:

\[\begin{align} p(x_t | x_{t-1}) & = \mathcal{N}(Ax_{t-1} + b, Q) \\ p(y_t | x_t) & = \mathcal{N}(Hx_t + d, R) \end{align}\]

where:

  • \(A\) is the A
  • \(b\) is the bset
  • \(Q\) is the Q
  • \(H\) is the obs_trans
  • \(d\) is the d
  • \(R\) is the R

in addition the model has also the parameters of the initial state that are used to initialize the filter:

  • m0
  • P0

The Kalman filter has 3 steps:

  • filter (updating the state at time t with observations till time t-1)
  • update (update the state at time t using the observation at time t)
  • smooth (update the state using the observations at time t+1)

In case of missing data the update step is skipped.

After smoothing the missing data at time t (\(y_t\)) can be imputed from the state (\(x_t\)) using this formula: \[p(y_t|x_t) = \mathcal{N}(Hx_t + d, R + HP^s_tH^T)\]

The Kalman Filter is an algorithm designed to estimate \(P(x_t | y_{0:t})\). As all state transitions and obss are linear with Gaussian distributed noise, these distributions can be represented exactly as Gaussian distributions with mean ms[t] and covs Ps[t]. Similarly, the Kalman Smoother is an algorithm designed to estimate \(P(x_t | y_{0:t-1})\)

Kalman Filter Base

Utils

show_as_row(_add_batch_dim(torch.ones(2)), _add_batch_dim(torch.ones(2,2)), _add_batch_dim(torch.ones(2,2,2)))

#0

tensor([[[1.],
         [1.]]])

#1

tensor([[[1., 1.],
         [1., 1.]]])

#2

tensor([[[1., 1.],
         [1., 1.]],

        [[1., 1.],
         [1., 1.]]])
show_as_row(_add_batch_dim(torch.ones(2)).shape, _add_batch_dim(torch.ones(2,2)).shape, _add_batch_dim(torch.ones(2,2,2)).shape)

#0

torch.Size([1, 2, 1])

#1

torch.Size([1, 2, 2])

#2

torch.Size([2, 2, 2])

Kalman Filter Base


source

KalmanFilterBase

 KalmanFilterBase (A:torch.Tensor, H:torch.Tensor, B:torch.Tensor,
                   Q:torch.Tensor, R:torch.Tensor, b:torch.Tensor,
                   d:torch.Tensor, m0:torch.Tensor, P0:torch.Tensor,
                   n_dim_state:int=None, n_dim_obs:int=None,
                   n_dim_contr:int=None,
                   var_names:Optional[Iterable[str]]=None,
                   contr_names:Optional[Iterable[str]]=None,
                   cov_checker:meteo_imp.gaussian.CheckPosDef|None=None,
                   use_conditional:bool=False, use_control:bool=True,
                   use_smooth:bool=True, pred_only_gap:bool=False,
                   pred_std:bool=False)

Base class for handling Kalman Filter implementation in PyTorch

Type Default Details
A Tensor [n_dim_state,n_dim_state] \(A\), state transition matrix
H Tensor [n_dim_obs, n_dim_state] \(H\), observation matrix
B Tensor [n_dim_state, n_dim_contr] \(B\) control matrix
Q Tensor [n_dim_state, n_dim_state] \(Q\), state trans covariance matrix
R Tensor [n_dim_obs, n_dim_obs] \(R\), observations covariance matrix
b Tensor [n_dim_state] \(b\), state transition offset
d Tensor [n_dim_obs] \(d\), observations offset
m0 Tensor [n_dim_state] \(m_0\)
P0 Tensor [n_dim_state, n_dim_state] \(P_0\)
n_dim_state int None Number of dimensions for state - default infered from parameters
n_dim_obs int None Number of dimensions for observations - default infered from parameters
n_dim_contr int None Number of dimensions for control - default infered from parameters
var_names Optional None Names of variables for printing
contr_names Optional None Names of control variables for printing
cov_checker meteo_imp.gaussian.CheckPosDef | None None Check covariance at every step
use_conditional bool False Use conditional distribution for gaps that don’t have all variables missing
use_control bool True Use the control in the filter
use_smooth bool True Use smoother for predictions (otherwise is filter only)
pred_only_gap bool False it True predictions are only for the gap
pred_std bool False return only stds and not covariances

Constructors

Giving all the parameters manually to the KalmanFilterBase init method is not convenient, hence we are having some methods that help initize the class

due to a bug in fastcore cannot subclass after creating class methods


source

KalmanFilter

 KalmanFilter (A:torch.Tensor, H:torch.Tensor, B:torch.Tensor,
               Q:torch.Tensor, R:torch.Tensor, b:torch.Tensor,
               d:torch.Tensor, m0:torch.Tensor, P0:torch.Tensor,
               n_dim_state:int=None, n_dim_obs:int=None,
               n_dim_contr:int=None,
               var_names:Optional[Iterable[str]]=None,
               contr_names:Optional[Iterable[str]]=None,
               cov_checker:meteo_imp.gaussian.CheckPosDef|None=None,
               use_conditional:bool=False, use_control:bool=True,
               use_smooth:bool=True, pred_only_gap:bool=False,
               pred_std:bool=False)

Base class for handling Kalman Filter implementation in PyTorch

Type Default Details
A Tensor [n_dim_state,n_dim_state] \(A\), state transition matrix
H Tensor [n_dim_obs, n_dim_state] \(H\), observation matrix
B Tensor [n_dim_state, n_dim_contr] \(B\) control matrix
Q Tensor [n_dim_state, n_dim_state] \(Q\), state trans covariance matrix
R Tensor [n_dim_obs, n_dim_obs] \(R\), observations covariance matrix
b Tensor [n_dim_state] \(b\), state transition offset
d Tensor [n_dim_obs] \(d\), observations offset
m0 Tensor [n_dim_state] \(m_0\)
P0 Tensor [n_dim_state, n_dim_state] \(P_0\)
n_dim_state int None Number of dimensions for state - default infered from parameters
n_dim_obs int None Number of dimensions for observations - default infered from parameters
n_dim_contr int None Number of dimensions for control - default infered from parameters
var_names Optional None Names of variables for printing
contr_names Optional None Names of control variables for printing
cov_checker meteo_imp.gaussian.CheckPosDef | None None Check covariance at every step
use_conditional bool False Use conditional distribution for gaps that don’t have all variables missing
use_control bool True Use the control in the filter
use_smooth bool True Use smoother for predictions (otherwise is filter only)
pred_only_gap bool False it True predictions are only for the gap
pred_std bool False return only stds and not covariances

source

KalmanFilterSR

 KalmanFilterSR (A:torch.Tensor, H:torch.Tensor, B:torch.Tensor,
                 Q:torch.Tensor, R:torch.Tensor, b:torch.Tensor,
                 d:torch.Tensor, m0:torch.Tensor, P0:torch.Tensor,
                 n_dim_state:int=None, n_dim_obs:int=None,
                 n_dim_contr:int=None,
                 var_names:Optional[Iterable[str]]=None,
                 contr_names:Optional[Iterable[str]]=None,
                 cov_checker:meteo_imp.gaussian.CheckPosDef|None=None,
                 use_conditional:bool=False, use_control:bool=True,
                 use_smooth:bool=True, pred_only_gap:bool=False,
                 pred_std:bool=False)

Base class for handling Kalman Filter implementation in PyTorch

Type Default Details
A Tensor [n_dim_state,n_dim_state] \(A\), state transition matrix
H Tensor [n_dim_obs, n_dim_state] \(H\), observation matrix
B Tensor [n_dim_state, n_dim_contr] \(B\) control matrix
Q Tensor [n_dim_state, n_dim_state] \(Q\), state trans covariance matrix
R Tensor [n_dim_obs, n_dim_obs] \(R\), observations covariance matrix
b Tensor [n_dim_state] \(b\), state transition offset
d Tensor [n_dim_obs] \(d\), observations offset
m0 Tensor [n_dim_state] \(m_0\)
P0 Tensor [n_dim_state, n_dim_state] \(P_0\)
n_dim_state int None Number of dimensions for state - default infered from parameters
n_dim_obs int None Number of dimensions for observations - default infered from parameters
n_dim_contr int None Number of dimensions for control - default infered from parameters
var_names Optional None Names of variables for printing
contr_names Optional None Names of control variables for printing
cov_checker meteo_imp.gaussian.CheckPosDef | None None Check covariance at every step
use_conditional bool False Use conditional distribution for gaps that don’t have all variables missing
use_control bool True Use the control in the filter
use_smooth bool True Use smoother for predictions (otherwise is filter only)
pred_only_gap bool False it True predictions are only for the gap
pred_std bool False return only stds and not covariances

Random parameters

kB = KalmanFilterBase.init_random(3,4, 3, dtype=torch.float64)
kB
Kalman Filter
        N dim obs: 3,
        N dim state: 4,
        N dim contr: 3
kB.Q
tensor([[[1.5725, 0.3829, 0.0843, 0.3637],
         [0.3829, 1.4430, 0.3358, 1.1988],
         [0.0843, 0.3358, 1.7816, 0.7411],
         [0.3637, 1.1988, 0.7411, 1.6773]]], dtype=torch.float64,
       grad_fn=<UnsafeViewBackward0>)
kB.Q_C
tensor([[[1.2540, 0.0000, 0.0000, 0.0000],
         [0.3053, 1.1618, 0.0000, 0.0000],
         [0.0672, 0.2714, 1.3052, 0.0000],
         [0.2901, 0.9556, 0.3542, 0.7446]]], dtype=torch.float64,
       grad_fn=<DiagonalScatterBackward0>)
test_close(kB.Q_C @ kB.Q_C.mT, kB.Q, eps=2e-5)
kB.P0 = to_posdef(torch.rand(1,3,3))

check that assigment works :)

kB.P0 = to_posdef(torch.rand(4, 4, dtype=torch.float64))
kB.P0_C
tensor([[0.9349, 0.0000, 0.0000, 0.0000],
        [0.3928, 1.0748, 0.0000, 0.0000],
        [0.7406, 0.7533, 0.8326, 0.0000],
        [0.5903, 0.0391, 0.8217, 0.9638]], dtype=torch.float64,
       grad_fn=<DiagonalScatterBackward0>)
kB = KalmanFilterBase.init_random(3,4, 3, dtype=torch.float64)
kB
Kalman Filter
        N dim obs: 3,
        N dim state: 4,
        N dim contr: 3
list(kB.named_parameters())
[('A',
  Parameter containing:
  tensor([[[0.1751, 0.5375, 0.7676, 0.7450],
           [0.1204, 0.4777, 0.5823, 0.3786],
           [0.8484, 0.2317, 0.3969, 0.7088],
           [0.0270, 0.6178, 0.6097, 0.9128]]], dtype=torch.float64,
         requires_grad=True)),
 ('H',
  Parameter containing:
  tensor([[[0.4984, 0.1597, 0.4458, 0.7349],
           [0.1070, 0.9531, 0.1942, 0.6683],
           [0.9186, 0.7123, 0.1806, 0.5045]]], dtype=torch.float64,
         requires_grad=True)),
 ('B',
  Parameter containing:
  tensor([[[0.2827, 0.1138, 0.9378],
           [0.5594, 0.9364, 0.5136],
           [0.8592, 0.7647, 0.5183],
           [0.2376, 0.5618, 0.5096]]], dtype=torch.float64, requires_grad=True)),
 ('Q_raw',
  Parameter containing:
  tensor([[[0.4590, 0.0000, 0.0000, 0.0000],
           [0.5348, 0.5296, 0.0000, 0.0000],
           [0.0025, 0.0749, 0.4603, 0.0000],
           [0.0588, 0.5416, 0.9309, 0.5859]]], dtype=torch.float64,
         requires_grad=True)),
 ('R_raw',
  Parameter containing:
  tensor([[[0.1393, 0.0000, 0.0000],
           [0.0249, 0.7613, 0.0000],
           [0.7829, 0.4311, 0.9199]]], dtype=torch.float64, requires_grad=True)),
 ('b',
  Parameter containing:
  tensor([[[0.4661],
           [0.3918],
           [0.0571],
           [0.9529]]], dtype=torch.float64, requires_grad=True)),
 ('d',
  Parameter containing:
  tensor([[[0.9334],
           [0.5645],
           [0.0695]]], dtype=torch.float64, requires_grad=True)),
 ('m0',
  Parameter containing:
  tensor([[[0.8234],
           [0.3850],
           [0.3380],
           [0.4376]]], dtype=torch.float64, requires_grad=True)),
 ('P0_raw',
  Parameter containing:
  tensor([[[0.4467, 0.0000, 0.0000, 0.0000],
           [0.4818, 0.9502, 0.0000, 0.0000],
           [0.4770, 0.3478, 0.9812, 0.0000],
           [0.2460, 0.9602, 0.6476, 0.2647]]], dtype=torch.float64,
         requires_grad=True))]
kB.state_dict()
OrderedDict([('A',
              tensor([[[0.1751, 0.5375, 0.7676, 0.7450],
                       [0.1204, 0.4777, 0.5823, 0.3786],
                       [0.8484, 0.2317, 0.3969, 0.7088],
                       [0.0270, 0.6178, 0.6097, 0.9128]]], dtype=torch.float64)),
             ('H',
              tensor([[[0.4984, 0.1597, 0.4458, 0.7349],
                       [0.1070, 0.9531, 0.1942, 0.6683],
                       [0.9186, 0.7123, 0.1806, 0.5045]]], dtype=torch.float64)),
             ('B',
              tensor([[[0.2827, 0.1138, 0.9378],
                       [0.5594, 0.9364, 0.5136],
                       [0.8592, 0.7647, 0.5183],
                       [0.2376, 0.5618, 0.5096]]], dtype=torch.float64)),
             ('Q_raw',
              tensor([[[0.4590, 0.0000, 0.0000, 0.0000],
                       [0.5348, 0.5296, 0.0000, 0.0000],
                       [0.0025, 0.0749, 0.4603, 0.0000],
                       [0.0588, 0.5416, 0.9309, 0.5859]]], dtype=torch.float64)),
             ('R_raw',
              tensor([[[0.1393, 0.0000, 0.0000],
                       [0.0249, 0.7613, 0.0000],
                       [0.7829, 0.4311, 0.9199]]], dtype=torch.float64)),
             ('b',
              tensor([[[0.4661],
                       [0.3918],
                       [0.0571],
                       [0.9529]]], dtype=torch.float64)),
             ('d',
              tensor([[[0.9334],
                       [0.5645],
                       [0.0695]]], dtype=torch.float64)),
             ('m0',
              tensor([[[0.8234],
                       [0.3850],
                       [0.3380],
                       [0.4376]]], dtype=torch.float64)),
             ('P0_raw',
              tensor([[[0.4467, 0.0000, 0.0000, 0.0000],
                       [0.4818, 0.9502, 0.0000, 0.0000],
                       [0.4770, 0.3478, 0.9812, 0.0000],
                       [0.2460, 0.9602, 0.6476, 0.2647]]], dtype=torch.float64))])

From filter


init_from’]

Built-in mutable sequence.

If no argument is given, the constructor creates a new empty list. The argument must be an iterable if specified.

k1 = KalmanFilter.init_random(3,4,3)
k2 = KalmanFilterSR.init_from(k1)
for p1, p2 in zip(k1.parameters(), k2.parameters()):
    test_close(p1,p2, eps=1e-3) #noise added by contraints

Get Info


source

KalmanFilterBase.get_info

 KalmanFilterBase.get_info ()
kB

Kalman Filter (3 obs, 4 state, 3 contr)

$A$

state x_0 x_1 x_2 x_3
x_0 0.1751 0.5375 0.7676 0.7450
x_1 0.1204 0.4777 0.5823 0.3786
x_2 0.8484 0.2317 0.3969 0.7088
x_3 0.0270 0.6178 0.6097 0.9128

$Q$

state x_0 x_1 x_2 x_3
x_0 0.9002 0.5074 0.0024 0.0558
x_1 0.5074 1.2712 0.0756 0.5690
x_2 0.0024 0.0756 0.9072 0.9246
x_3 0.0558 0.5690 0.9246 2.2208

$b$

state offset
x_0 0.4661
x_1 0.3918
x_2 0.0571
x_3 0.9529

$H$

variable x_0 x_1 x_2 x_3
y_0 0.4984 0.1597 0.4458 0.7349
y_1 0.1070 0.9531 0.1942 0.6683
y_2 0.9186 0.7123 0.1806 0.5045

$R$

variable y_0 y_1 y_2
y_0 0.5856 0.0190 0.5991
y_1 0.0190 1.3107 0.5130
y_2 0.5991 0.5130 2.3748

$d$

variable offset
y_0 0.9334
y_1 0.5645
y_2 0.0695

$B$

state c_0 c_1 c_2
x_0 0.2827 0.1138 0.9378
x_1 0.5594 0.9364 0.5136
x_2 0.8592 0.7647 0.5183
x_3 0.2376 0.5618 0.5096

$m_0$

state mean
x_0 0.8234
x_1 0.3850
x_2 0.3380
x_3 0.4376

$P_0$

state x_0 x_1 x_2 x_3
x_0 0.8860 0.4535 0.4490 0.2316
x_1 0.4535 1.8632 0.6740 1.3448
x_2 0.4490 0.6740 2.0374 1.2929
x_3 0.2316 1.3448 1.2929 2.0978

Test data

reset_seed()
data, mask, control = get_test_data()
show_as_row(data, mask, control)

data

tensor([[[0.8775,    nan,    nan],
         [0.6706,    nan, 0.9272],
         [   nan, 0.4967,    nan],
         [   nan,    nan,    nan],
         [   nan,    nan, 0.4760],
         [0.7606, 0.7759, 0.5243],
         [0.3714, 0.0426, 0.2343],
         [0.9991, 0.1775,    nan],
         [0.6734,    nan, 0.6468],
         [0.5825, 0.4599, 0.7960]],

        [[0.9038, 0.9735, 0.6428],
         [0.3725, 0.2052,    nan],
         [0.4448, 0.5775, 0.7237],
         [0.5927,    nan, 0.6441],
         [   nan, 0.9132, 0.0329],
         [0.4856, 0.9927, 0.5895],
         [0.2611, 0.9413, 0.1371],
         [0.8726, 0.5590, 0.8451],
         [0.1253, 0.9434, 0.0462],
         [0.2360, 0.0239, 0.8950]]], dtype=torch.float64)

mask

tensor([[[ True, False, False],
         [ True, False,  True],
         [False,  True, False],
         [False, False, False],
         [False, False,  True],
         [ True,  True,  True],
         [ True,  True,  True],
         [ True,  True, False],
         [ True, False,  True],
         [ True,  True,  True]],

        [[ True,  True,  True],
         [ True,  True, False],
         [ True,  True,  True],
         [ True, False,  True],
         [False,  True,  True],
         [ True,  True,  True],
         [ True,  True,  True],
         [ True,  True,  True],
         [ True,  True,  True],
         [ True,  True,  True]]])

control

tensor([[[0.9201, 0.6883, 0.5342],
         [0.7847, 0.3137, 0.1778],
         [0.5838, 0.9799, 0.3611],
         [0.3155, 0.7475, 0.5450],
         [0.5641, 0.2493, 0.8323],
         [0.9723, 0.1883, 0.3605],
         [0.5344, 0.3443, 0.7696],
         [0.3410, 0.7553, 0.3177],
         [0.0315, 0.5209, 0.6514],
         [0.3131, 0.4510, 0.3550]],

        [[0.4790, 0.0676, 0.3606],
         [0.7299, 0.6713, 0.3134],
         [0.7460, 0.1291, 0.4653],
         [0.5693, 0.9906, 0.8288],
         [0.9039, 0.5240, 0.6277],
         [0.3574, 0.0076, 0.6530],
         [0.8667, 0.9368, 0.8667],
         [0.6749, 0.3526, 0.6618],
         [0.0837, 0.7188, 0.7247],
         [0.3211, 0.4898, 0.9030]]], dtype=torch.float64)

Standard Kalman Filter

k = KalmanFilter.init_random(3,4,3)

Filter

Filter predict

Probability of state at time t given state a time t-1

\(p(x_t) = \mathcal{N}(x_t; m_t^-, P_t^-)\) where:

  • predicted state mean: \(m_t^- = Am_{t-1} + B c_t + b\)

  • predicted state covariance: \(P_t^- = AP_{t-1}A^T + Q\)

A, Q, b, B, m_pr,P_pr= (k.A, k.Q, k.b, k.B,torch.concat([k.m0]*2), torch.concat([k.P0]*2))
m_pr.shape, P_pr.shape, A.shape
(torch.Size([2, 4, 1]), torch.Size([2, 4, 4]), torch.Size([1, 4, 4]))

Covariance

implement \(P_t^- = AP_{t-1}A^T + Q\)

P_m = _filter_predict_cov_stand(A, Q, P_pr)
P_m
tensor([[[6.2873, 3.7655, 4.8718, 5.8589],
         [3.7655, 4.8551, 4.7314, 5.1009],
         [4.8718, 4.7314, 5.9373, 6.0160],
         [5.8589, 5.1009, 6.0160, 8.5954]],

        [[6.2873, 3.7655, 4.8718, 5.8589],
         [3.7655, 4.8551, 4.7314, 5.1009],
         [4.8718, 4.7314, 5.9373, 6.0160],
         [5.8589, 5.1009, 6.0160, 8.5954]]], dtype=torch.float64,
       grad_fn=<AddBackward0>)

Mean

Predict

B.shape
torch.Size([1, 4, 3])
control.shape
torch.Size([2, 10, 3])
B[0].shape
torch.Size([4, 3])
B[0] @ control[0, 0].unsqueeze(-1)
tensor([[0.7217],
        [0.9572],
        [0.5742],
        [1.2138]], dtype=torch.float64, grad_fn=<MmBackward0>)
m_m, P_m = _filter_predict(
    A, Q, b, B,
    m_pr,P_pr, control[:, 0, :].unsqueeze(-1))
show_as_row(m_m, P_m)

m_m

tensor([[[2.4986],
         [2.2429],
         [1.8833],
         [3.6164]],

        [[2.1377],
         [1.5964],
         [1.5371],
         [2.8315]]], dtype=torch.float64, grad_fn=)

P_m

tensor([[[6.2873, 3.7655, 4.8718, 5.8589],
         [3.7655, 4.8551, 4.7314, 5.1009],
         [4.8718, 4.7314, 5.9373, 6.0160],
         [5.8589, 5.1009, 6.0160, 8.5954]],

        [[6.2873, 3.7655, 4.8718, 5.8589],
         [3.7655, 4.8551, 4.7314, 5.1009],
         [4.8718, 4.7314, 5.9373, 6.0160],
         [5.8589, 5.1009, 6.0160, 8.5954]]], dtype=torch.float64,
       grad_fn=)
(m_m.shape, P_m.shape,)
(torch.Size([2, 4, 1]), torch.Size([2, 4, 4]))

Filter update

Probability of state at time t given the observations at time t

\(p(x_t|y_t) = \mathcal{N}(x_t; m_t, P_t)\) where:

  • predicted obs mean: \(z_t = Hm_t^- + d\)

  • prediced obs covariance: \(S_t = HP_t^-H^T + R\)

  • kalman gain\(K_t = P_t^-H^TS_t^{-1}\)

  • corrected state mean: \(m_t = m_t^- + K_t(y_t - z_t)\)

  • corrected state covariance: \(P_t = (I-K_tH)P_t^-\)

if the observation are missing this step is skipped and the corrected state is equal to the predicted state

Need to figure out the Nans for the gradients …

Kalman Gain

Don’t compute the inverse of the matrix, but use cholesky_solve to invert the matrix

H, d, R, obs = k.H, k.d, k.R, data[:,0,:].unsqueeze(-1)
K = _filter_update_k_gain(H, R, P_m)
K
tensor([[[ 0.1177,  0.2313,  0.0612],
         [-0.5573,  0.6582,  0.0288],
         [-0.3189,  0.5919, -0.0174],
         [ 0.3269,  0.3056, -0.0378]],

        [[ 0.1177,  0.2313,  0.0612],
         [-0.5573,  0.6582,  0.0288],
         [-0.3189,  0.5919, -0.0174],
         [ 0.3269,  0.3056, -0.0378]]], dtype=torch.float64,
       grad_fn=<TransposeBackward0>)
test_close(_filter_update_k_gain(H, R, P_m), P_m @ H.mT @ torch.inverse(H @ P_m @ H.mT + R))

Covariance

P = _filter_update_cov(H, K, P_m)
P
tensor([[[ 2.1074, -0.2158,  0.3602,  0.1911],
         [-0.2158,  0.4808,  0.0472, -0.1503],
         [ 0.3602,  0.0472,  0.8035, -0.0177],
         [ 0.1911, -0.1503, -0.0177,  0.8448]],

        [[ 2.1074, -0.2158,  0.3602,  0.1911],
         [-0.2158,  0.4808,  0.0472, -0.1503],
         [ 0.3602,  0.0472,  0.8035, -0.0177],
         [ 0.1911, -0.1503, -0.0177,  0.8448]]], dtype=torch.float64,
       grad_fn=<UnsafeViewBackward0>)

Mean

z = H @ m_m + d; z
(obs - z)
tensor([[[-3.3343],
         [    nan],
         [    nan]],

        [[-2.4181],
         [-5.0754],
         [-3.9145]]], dtype=torch.float64, grad_fn=<SubBackward0>)
m = _filter_update_mean(H, d, K, m_m, obs)
m
tensor([[[    nan],
         [    nan],
         [    nan],
         [    nan]],

        [[ 0.4396],
         [-0.5094],
         [-0.6276],
         [ 0.6379]]], dtype=torch.float64, grad_fn=<AddBackward0>)
m, P = _filter_update(H, d, R, m_m, P_m, obs)
show_as_row(m, P)
m.shape, P.shape

m

tensor([[[    nan],
         [    nan],
         [    nan],
         [    nan]],

        [[ 0.4396],
         [-0.5094],
         [-0.6276],
         [ 0.6379]]], dtype=torch.float64, grad_fn=)

P

tensor([[[ 2.1074, -0.2158,  0.3602,  0.1911],
         [-0.2158,  0.4808,  0.0472, -0.1503],
         [ 0.3602,  0.0472,  0.8035, -0.0177],
         [ 0.1911, -0.1503, -0.0177,  0.8448]],

        [[ 2.1074, -0.2158,  0.3602,  0.1911],
         [-0.2158,  0.4808,  0.0472, -0.1503],
         [ 0.3602,  0.0472,  0.8035, -0.0177],
         [ 0.1911, -0.1503, -0.0177,  0.8448]]], dtype=torch.float64,
       grad_fn=)
(torch.Size([2, 4, 1]), torch.Size([2, 4, 4]))

there are nan in the output because there are nan in the observations

The next functions adds the support for missing obsevations by also using a mask

Missing observations

If all the observations at time \(t\) are missing the correct step is skipped and the filtered state at time \(t\) () is the same of the filtered state.

If only some observations are missing a variation of equation can be used.

\(y^{ng}_t\) is a vector containing the observations that are not missing at time \(t\).

It can be expressed as a linear transformation of \(y_t\)

\[ y^{ng}_t = My_t\]

where \(M\) is a mask matrix that is used to select the subset of \(y_t\) that is observed. \(M \in \mathbb{R}^{n_{ng} \times n}\) and is made of columns which are made of all zeros but for an entry 1 at row corresponding to the non-missing observation. hence:

\[ p(y^{ng}_t) = \mathcal{N}(M\mu_{y_t}, M\Sigma_{y_t}M^T)\]

from which you can derive

\[ p(y^{ng}_t|x_t) = p(MHx_t + Mb, MRM^T) \tag{1}\]

Then the posterior \(p(x_t|y_t^{ng})\) can be computed similarly of equation @filter_correct as:

\[ p(x_t|y^{ng}_t) = \mathcal{N}(x_t; m_t, P_t) \tag{2}\]

where:

  • predicted obs mean: \(z_t = MHm_t^- + Md\)
  • predicted obs covariance: \(S_t = MHP_t^-(MH)^T + MRM^T\)
  • Kalman gain \(K_t = P_t^-(MH)^TS_t^{-1}\)
  • corrected state mean: \(m_t = m_t^- + K_t(My_t - z_t)\)
  • corrected state covariance: \(P_t = (I-K_tMH)P_t^-\)
Details implementation

For the implementation the matrix multiplication \(MH\) can be replaced with H[m] where m is the mask for the rows for H and \(MRM^T\) with R[m][:,m]

H, R, d,obs, mm = k.H, k.R, k.d, data[:,0,:].unsqueeze(-1), mask[:,0,:].unsqueeze(-1)
m = torch.tensor([False,True,True]) # mask batch
M = torch.tensor([[[0,1,0], # mask matrix
                  [0,0,1]]], dtype=torch.float64)
show_as_row(m, M, H, R)

m

tensor([False,  True,  True])

M

tensor([[[0., 1., 0.],
         [0., 0., 1.]]], dtype=torch.float64)

H

Parameter containing:
tensor([[[0.1349, 0.0519, 0.1180, 0.9767],
         [0.1679, 0.8635, 0.3753, 0.9760],
         [0.2125, 0.8049, 0.2124, 0.6794]]], dtype=torch.float64,
       requires_grad=True)

R

tensor([[[1.6259, 1.1183, 0.8535],
         [1.1183, 1.2831, 0.9935],
         [0.8535, 0.9935, 2.4550]]], dtype=torch.float64,
       grad_fn=)
M @ M.mT
tensor([[[1., 0.],
         [0., 1.]]], dtype=torch.float64)
M @ H, H[:, m]
(tensor([[[0.1679, 0.8635, 0.3753, 0.9760],
          [0.2125, 0.8049, 0.2124, 0.6794]]], dtype=torch.float64,
        grad_fn=<UnsafeViewBackward0>),
 tensor([[[0.1679, 0.8635, 0.3753, 0.9760],
          [0.2125, 0.8049, 0.2124, 0.6794]]], dtype=torch.float64,
        grad_fn=<IndexBackward0>))
M @ R @ M.mT, R[:,m][:,:,m]
(tensor([[[1.2831, 0.9935],
          [0.9935, 2.4550]]], dtype=torch.float64,
        grad_fn=<UnsafeViewBackward0>),
 tensor([[[1.2831, 0.9935],
          [0.9935, 2.4550]]], dtype=torch.float64, grad_fn=<IndexBackward0>))

By using partially missing observations _filter_update cannot be easily batched as the shape of the intermediate variables depends on the number of observed variables. So the idea is to divide the batch in blocks that share the same number of variables missing.

mask_values, indices = torch.unique(mask[:,1,:], dim=0, return_inverse=True)
mask_values, indices
(tensor([[ True, False,  True],
         [ True,  True, False]]),
 tensor([0, 1]))
Update mask
mm = mask[0,0,:]
H[:, mm].shape, d[:, mm].shape, R[:, mm][:, :,mm].shape, obs[:, mm].shape
(torch.Size([1, 1, 4]),
 torch.Size([1, 1, 1]),
 torch.Size([1, 1, 1]),
 torch.Size([2, 1, 1]))
show_as_row(*_filter_update_mask(H, d, R, m_m, P_m, obs, mask[0, 0, :] ))

#0

tensor([[[0.7184],
         [0.7151],
         [0.0696],
         [1.1526]],

        [[0.8467],
         [0.4883],
         [0.2217],
         [1.0446]]], dtype=torch.float64, grad_fn=)

#1

tensor([[[2.3679, 0.4016, 0.8785, 0.4342],
         [0.4016, 1.9680, 1.3041, 0.4451],
         [0.8785, 1.3041, 1.8687, 0.4890],
         [0.4342, 0.4451, 0.4890, 1.0874]],

        [[2.3679, 0.4016, 0.8785, 0.4342],
         [0.4016, 1.9680, 1.3041, 0.4451],
         [0.8785, 1.3041, 1.8687, 0.4890],
         [0.4342, 0.4451, 0.4890, 1.0874]]], dtype=torch.float64,
       grad_fn=)
m, P = _filter_update_mask(H, d, R, m_m, P_m, obs, mask[0, 0, :] )
m.shape, P.shape
(torch.Size([2, 4, 1]), torch.Size([2, 4, 4]))
Update mask batch
m, P = _filter_update_mask_batch(H, d, R, m_m, P_m, obs, mask[:,0,:] )
show_as_row(m, P)
m.shape, P.shape

m

tensor([[[ 0.7184],
         [ 0.7151],
         [ 0.0696],
         [ 1.1526]],

        [[ 0.4396],
         [-0.5094],
         [-0.6276],
         [ 0.6379]]], dtype=torch.float64, grad_fn=)

P

tensor([[[ 2.3679,  0.4016,  0.8785,  0.4342],
         [ 0.4016,  1.9680,  1.3041,  0.4451],
         [ 0.8785,  1.3041,  1.8687,  0.4890],
         [ 0.4342,  0.4451,  0.4890,  1.0874]],

        [[ 2.1074, -0.2158,  0.3602,  0.1911],
         [-0.2158,  0.4808,  0.0472, -0.1503],
         [ 0.3602,  0.0472,  0.8035, -0.0177],
         [ 0.1911, -0.1503, -0.0177,  0.8448]]], dtype=torch.float64,
       grad_fn=)
(torch.Size([2, 4, 1]), torch.Size([2, 4, 4]))
m.sum().backward(retain_graph=True) # check that pytorch can compute gradients with the whole batch and gradients aren't nan
H.grad
tensor([[[-1.1769, -2.7620, -0.9158, -2.4954],
         [-2.0431,  0.8269,  0.5074, -1.5867],
         [ 0.1472,  0.0285,  0.1012,  0.0356]]], dtype=torch.float64)

Filter All

The resursive version of the kalman filter is apperently breaking pytorch gradients calculations so a workaround is needed. During the loop the states are saved in a python list and then at the end they are combined back into a tensor. The last line of the function does:

  • convert lists to tensors
  • correct order dimensions
filt_state, pred_state  = k._filter_all(data, mask, control)
(ms, Ps), (m_ms, P_ms) = filt_state, pred_state

Predictions at time 0 for both batches

show_as_row(*map(Self.shape(), (m_ms, P_ms, ms, Ps,)))

#0

torch.Size([2, 10, 4, 1])

#1

torch.Size([2, 10, 4, 4])

#2

torch.Size([2, 10, 4, 1])

#3

torch.Size([2, 10, 4, 4])
show_as_row(*map(lambda x:x[0][0], (m_ms, P_ms, ms, Ps,)))

#0

tensor([[0.6498],
        [0.3257],
        [0.8462],
        [0.7727]], dtype=torch.float64, grad_fn=)

#1

tensor([[0.7511, 0.6229, 0.6414, 0.4561],
        [0.6229, 1.2138, 0.6974, 0.9224],
        [0.6414, 0.6974, 1.4388, 1.4051],
        [0.4561, 0.9224, 1.4051, 3.1638]], dtype=torch.float64,
       grad_fn=)

#2

tensor([[0.6392],
        [0.3073],
        [0.8191],
        [0.7180]], dtype=torch.float64, grad_fn=)

#3

tensor([[0.6695, 0.4821, 0.4340, 0.0368],
        [0.4821, 0.9707, 0.3394, 0.1987],
        [0.4340, 0.3394, 0.9115, 0.3391],
        [0.0368, 0.1987, 0.3391, 1.0092]], dtype=torch.float64,
       grad_fn=)

Filter

The filter methods wraps _filter_all but in addition:

  • returns only filtered state
  • remove last dimensions from mean

source

KalmanFilter.filter

 KalmanFilter.filter (obs:torch.Tensor, mask:torch.Tensor,
                      control:torch.Tensor)

Filter observation

Type Details
obs Tensor ([n_batches], n_obs, [self.n_dim_obs]) where n_batches and n_dim_obs dimensions can be omitted if 1
mask Tensor ([n_batches], n_obs, [self.n_dim_obs]) where n_batches and n_dim_obs dimensions can be omitted if 1
control Tensor ([n_batches], n_obs, [self.n_dim_contr])
Returns ListMultiNormal Filtered state (n_batches, n_obs, self.n_dim_state)
filt = k.filter(data, mask, control)
filt.mean.shape, filt.cov.shape
(torch.Size([2, 10, 4, 1]), torch.Size([2, 10, 4, 4]))

Smooth

Smooth update step

compute the probability of the state at time t given all the observations

\(p(x_t|Y) = \mathcal{N}(x_t; m_t^s, P_t^s)\) where:

  • Kalman smoothing gain: \(G_t = P_tA^T(P_{t+1}^-)^{-1}\)
  • smoothed mean: \(m_t^s = m_t + G_t(m_{t+1}^s - m_{t+1}^-)\)
  • smoothed covariance: \(P_t^s = P_t + G_t(P_{t+1}^s - P_{t+1}^-)G_t^T\)
test_close(_smooth_gain(A, filt_state.cov, pred_state.cov), filt_state.cov @ A.mT @ torch.inverse(pred_state.cov))
K_p = _smooth_gain(A, filt_state[:,0].cov, pred_state[:,0].cov)
K_p.shape
torch.Size([2, 4, 4])
_smooth_mean(K_p, filt_state[:,0].mean, pred_state[:,0].mean, filt_state[:,0].mean)
tensor([[[ 0.6254],
         [ 0.2922],
         [ 0.7965],
         [ 0.6964]],

        [[-0.1829],
         [-0.7088],
         [-0.0211],
         [ 0.2570]]], dtype=torch.float64, grad_fn=<AddBackward0>)
_smooth_cov(K_p, filt_state[:,0].cov, pred_state[:,0].cov, filt_state[:,0].cov)
tensor([[[ 0.5327,  0.3321,  0.2094, -0.1773],
         [ 0.3321,  0.8061,  0.0930, -0.0363],
         [ 0.2094,  0.0930,  0.5427, -0.0125],
         [-0.1773, -0.0363, -0.0125,  0.6738]],

        [[ 0.3342,  0.0671,  0.0652, -0.2001],
         [ 0.0671,  0.3478, -0.1115, -0.1245],
         [ 0.0652, -0.1115,  0.4549, -0.0025],
         [-0.2001, -0.1245, -0.0025,  0.6981]]], dtype=torch.float64,
       grad_fn=<DivBackward0>)
show_as_row(*_smooth_update(A, filt_state[:, 0], pred_state[:, 0], filt_state[:, 0]))

#0

tensor([[[ 0.6254],
         [ 0.2922],
         [ 0.7965],
         [ 0.6964]],

        [[-0.1829],
         [-0.7088],
         [-0.0211],
         [ 0.2570]]], dtype=torch.float64, grad_fn=)

#1

tensor([[[ 0.5327,  0.3321,  0.2094, -0.1773],
         [ 0.3321,  0.8061,  0.0930, -0.0363],
         [ 0.2094,  0.0930,  0.5427, -0.0125],
         [-0.1773, -0.0363, -0.0125,  0.6738]],

        [[ 0.3342,  0.0671,  0.0652, -0.2001],
         [ 0.0671,  0.3478, -0.1115, -0.1245],
         [ 0.0652, -0.1115,  0.4549, -0.0025],
         [-0.2001, -0.1245, -0.0025,  0.6981]]], dtype=torch.float64,
       grad_fn=)
show_as_row(*map(Self.shape(), _smooth_update(A, MNormal(m_m, P_m), MNormal(m_m, P_m), MNormal(m_m, P_m))))

#0

torch.Size([2, 4, 1])

#1

torch.Size([2, 4, 4])

Smooth

smooth_state = _smooth(k.A,  filt_state, pred_state)
show_as_row(smooth_state.mean[0][0], smooth_state.cov[0][0])

#0

tensor([[-0.0785],
        [-0.5839],
        [-0.0959],
        [ 0.0031]], dtype=torch.float64, grad_fn=)

#1

tensor([[ 0.4578,  0.2300,  0.1755, -0.1481],
        [ 0.2300,  0.6676,  0.0346, -0.0171],
        [ 0.1755,  0.0346,  0.5665,  0.0642],
        [-0.1481, -0.0171,  0.0642,  0.7652]], dtype=torch.float64,
       grad_fn=)
show_as_row(smooth_state.mean.shape, smooth_state.cov.shape)

#0

torch.Size([2, 10, 4, 1])

#1

torch.Size([2, 10, 4, 4])

KalmanFilter method


source

KalmanFilter.smooth

 KalmanFilter.smooth (obs:torch.Tensor, mask:torch.Tensor,
                      control:torch.Tensor)

Kalman Filter Smoothing

Type Details
obs Tensor
mask Tensor
control Tensor
Returns ListMultiNormal [n_timesteps, n_dim_state] smoothed state
smoothed_state = k.smooth(data, mask, control)
show_as_row(smoothed_state.mean.shape, smoothed_state.cov.shape)

#0

torch.Size([2, 10, 4, 1])

#1

torch.Size([2, 10, 4, 4])
smoothed_state.mean.sum().backward(retain_graph=True)
A.grad
tensor([[[-4.9495, -5.2939, -7.8642,  0.2546],
         [ 0.2445,  8.3747, 13.5693, -7.4543],
         [-5.4811, -5.2497, -6.6092, -1.5187],
         [-1.5874,  2.1945,  3.6410, -2.3143]]], dtype=torch.float64)

Predict

The prediction at time t (\(y_t\)) are computed rom the state (\(x_t\)) using this formula: \[p(y_t|x_t) = \mathcal{N}(Hx_t + d, R + HP^s_tH^T)\]

this works both if the state was filtered or smoother

This add the supports for conditional predictions, which means that at the time (t) when we are making the predictions some of the variables have been actually observed. Since the model prediction is a normal distribution we can condition on the observed values and thus improve the predictions. See conditional_gaussian

In order to have conditional predictions that make sense it’s not possible to return the full covariance matrix for the predictions but only the standard deviations

test_m = torch.tensor(
    [[True, True, True,],
    [False, True, True],
    [False, False, False]]
)
torch.logical_xor(test_m.all(-1), test_m.any(-1))
tensor([False,  True, False])
A = torch.rand(2,2,3,3)
(A @ A).shape
torch.Size([2, 2, 3, 3])

predict can be vectorized across both the batch and the timesteps, except for timesteps that require conditional predictions

smoothed_state.mean.shape, smoothed_state.cov.shape
(torch.Size([2, 10, 4, 1]), torch.Size([2, 10, 4, 4]))
(k.H @ smoothed_state.mean).shape
torch.Size([2, 10, 3, 1])
pred_obs0 = k._obs_from_state(smoothed_state)
pred_obs0.mean.shape
torch.Size([2, 10, 3])
pred_obs0.cov.shape
torch.Size([2, 10, 3, 3])

source

KalmanFilter.predict

 KalmanFilter.predict (obs, mask, control, smooth=True)

Predicted observations at all times

pred = k.predict(data, mask, control)
pred.mean.shape, pred.std.shape
(torch.Size([2, 10, 3]), torch.Size([2, 10, 3]))

Gradients …

def get_grad_mask(x):
    "filter gradient after sub the masks value with x"
    d = data.clone()
    d[~mask] = x
    k.predict(data, mask, control).mean.sum().backward(retain_graph=True)
    grad = k.R_raw.grad.clone()
    k.zero_grad() 
    return grad
get_grad_mask(10)
tensor([[[ -2.5082,   0.0000,   0.0000],
         [-11.7667,  -0.1206,   0.0000],
         [ -7.6345,  -8.3390,  -4.0905]]], dtype=torch.float64)
test_close(get_grad_mask(1), get_grad_mask(10))

source

KalmanFilter.predict

 KalmanFilter.predict (obs, mask, control, smooth=True)

Predicted observations at all times

@patch
def predict_times(self: KalmanFilter, times, obs, mask=None, smooth=True, check_args=None):
    """Predicted observations at specific times """
    state = self.smooth(obs, mask, check_args) if smooth else self.filter(obs, mask, check_args)
    obs, mask = self._parse_obs(obs, mask)
    times = array1d(times)
    
    n_timesteps = obs.shape[0]
    n_features = obs.shape[1] if len(obs.shape) > 1 else 1
    
    if times.max() > n_timesteps or times.min() < 0:
        raise ValueError(f"provided times range from {times.min()} to {times.max()}, which is outside allowed range : 0 to {n_timesteps}")

    means = torch.empty((times.shape[0], n_features), dtype=obs.dtype, device=obs.device)
    stds = torch.empty((times.shape[0], n_features), dtype=obs.dtype, device=obs.device) 
    for i, t in enumerate(times):
        mean, std = self._obs_from_state(
            state.mean[t],
            state.cov[t],
            {'t': t, **check_args} if check_args is not None else None
        )
        
        means[i], stds[i] = _get_cond_pred(ListNormal(mean, std), obs[t], mask[t])
    
    return ListNormal(means, stds)

Numerical Stable Kalman Filter (Square Root Filter)

kSR = KalmanFilterSR.init_random(3,4,3)

Filter

Filter predict

Covariance

Implement the numerical stable version of the covariance update

A, Q, b, B, m_pr,P_pr= (k.A, k.Q, k.b, k.B,torch.concat([k.m0]*2), torch.concat([k.P0]*2))
Q_C = kSR.Q_C
_filter_predict_cov_stand(kSR.A, kSR.Q_C @ kSR.Q_C.mT, P_pr)
tensor([[[1.9504, 2.3535, 2.1727, 2.2936],
         [2.3535, 5.8436, 5.0464, 6.0831],
         [2.1727, 5.0464, 6.2940, 6.0969],
         [2.2936, 6.0831, 6.0969, 8.1538]],

        [[1.9504, 2.3535, 2.1727, 2.2936],
         [2.3535, 5.8436, 5.0464, 6.0831],
         [2.1727, 5.0464, 6.2940, 6.0969],
         [2.2936, 6.0831, 6.0969, 8.1538]]], dtype=torch.float64,
       grad_fn=<AddBackward0>)
P_pr_C = torch.linalg.cholesky(P_pr)

\[W = \begin{bmatrix}AC_{t-1}&C_Q\end{bmatrix}\]

W = torch.concat([A @ P_pr_C, Q_C.expand_as(P_pr_C)], dim=-1)
W.shape
torch.Size([2, 4, 8])
P_m_C = torch.linalg.qr(W.mT).R.mT
P_m_C
tensor([[[-2.5709,  0.0000,  0.0000,  0.0000],
         [-1.7904,  1.1590,  0.0000,  0.0000],
         [-1.9907,  1.3402,  1.2995,  0.0000],
         [-2.2985,  0.5614,  0.8125, -1.0728]],

        [[-2.5709,  0.0000,  0.0000,  0.0000],
         [-1.7904,  1.1590,  0.0000,  0.0000],
         [-1.9907,  1.3402,  1.2995,  0.0000],
         [-2.2985,  0.5614,  0.8125, -1.0728]]], dtype=torch.float64,
       grad_fn=<TransposeBackward0>)
P_m_C @ P_m_C.mT
tensor([[[6.6096, 4.6031, 5.1178, 5.9093],
         [4.6031, 4.5489, 5.1174, 4.7660],
         [5.1178, 5.1174, 7.4476, 6.3838],
         [5.9093, 4.7660, 6.3838, 7.4094]],

        [[6.6096, 4.6031, 5.1178, 5.9093],
         [4.6031, 4.5489, 5.1174, 4.7660],
         [5.1178, 5.1174, 7.4476, 6.3838],
         [5.9093, 4.7660, 6.3838, 7.4094]]], dtype=torch.float64,
       grad_fn=<UnsafeViewBackward0>)
P_m = _filter_predict_cov_stand(A, Q_C @ Q_C.mT, P_pr)
test_close(P_m, P_m_C @ P_m_C.mT)
(P_m - P_m_C @ P_m_C.mT).max()
tensor(8.8818e-16, dtype=torch.float64, grad_fn=<MaxBackward1>)
test_P_m_C = torch.linalg.cholesky(P_m)
(test_P_m_C @ test_P_m_C.mT - P_m_C @ P_m_C.mT).max()
tensor(8.8818e-16, dtype=torch.float64, grad_fn=<MaxBackward1>)
(test_P_m_C - P_m_C).max()
tensor(5.1418, dtype=torch.float64, grad_fn=<MaxBackward1>)

Cholesky decomposition is not unique! but the solution is correct

P_m_C = _filter_predict_cov_SR(A, Q_C, P_pr_C)
test_close(P_m_C @ P_m_C.mT, _filter_predict_cov_stand(A, Q_C @ Q_C.mT, P_pr))
def fuzz_filter_predict_cov_SR(n=10):
    for _ in range(n):
        kSR = KalmanFilterSR.init_random(5,10,4)
        A, Q_C, b, B, m_pr,P_pr = (kSR.A.unsqueeze(0), kSR.Q_C.unsqueeze(0), kSR.b.unsqueeze(-1),
                                                  kSR.B.unsqueeze(0),
                                                  torch.stack([kSR.m0]*2).unsqueeze(-1),
                                                  torch.stack([kSR.P0]*2))
        P_pr_C = torch.linalg.cholesky(P_pr)
        P_m_C = _filter_predict_cov_SR(A, Q_C, P_pr_C)
        test_close(P_m_C @ P_m_C.mT, _filter_predict_cov_stand(A, Q_C @ Q_C.mT, P_pr), eps=5e-13)
fuzz_filter_predict_cov_SR()

Predict

m_m, P_m_C = _filter_predict_SR(kSR.A, kSR.Q_C, kSR.b, kSR.B, m_pr, P_pr_C, control[:,0].unsqueeze(-1)) 
show_as_row(m_m, P_m_C)

m_m

tensor([[[1.9549],
         [2.4811],
         [3.3275],
         [3.5545]],

        [[1.1647],
         [2.3109],
         [2.4892],
         [2.8133]]], dtype=torch.float64, grad_fn=)

P_m_C

tensor([[[-1.3966,  0.0000,  0.0000,  0.0000],
         [-1.6852, -1.7331,  0.0000,  0.0000],
         [-1.5557, -1.3991,  1.3843,  0.0000],
         [-1.6423, -1.9130,  0.6253, -1.1858]],

        [[-1.3966,  0.0000,  0.0000,  0.0000],
         [-1.6852, -1.7331,  0.0000,  0.0000],
         [-1.5557, -1.3991,  1.3843,  0.0000],
         [-1.6423, -1.9130,  0.6253, -1.1858]]], dtype=torch.float64,
       grad_fn=)
is_posdef(P_m_C @ P_m_C.mT).all()
tensor(True)

Filter Update

def is_sr(x_C, x): return torch.allclose(x_C @ x_C.mT, x)

source

cat_2d

 cat_2d (x)
Details
x matrix as list of list of Tensor
x = [[]]

Example calculations

H, R, R_C, obs = kSR.H, kSR.R, kSR.R_C, data[:,0,:].unsqueeze(-1)
P_m = P_m_C @ P_m_C.mT

# use standard filter to compute expected result
S = H @ P_m @ H.mT + R
S_C = torch.linalg.cholesky(S)
K = _filter_update_k_gain(H, R, P_m)
K_bar = K @ S_C
P_stand = _filter_update_cov(H, K, P_m)
P_C_stand = torch.linalg.cholesky(P_stand)
assert all([is_sr(R_C, R), is_sr(S_C, S)])

\[M = \begin{bmatrix} R^{1/2} & H(P^-)^{1/2} \\ 0 & (P^-)^{1/2} \end{bmatrix}\]

M = cat_2d([[R_C.expand(2,-1,-1)              , H @ P_m_C], 
            [torch.zeros_like((H @ P_m_C).mT),  P_m_C]])
M[0]
tensor([[ 1.2478,  0.0000,  0.0000, -1.2662, -1.1753,  0.7242, -0.3448],
        [ 0.3190,  1.2801,  0.0000, -2.1349, -1.5926,  1.0952, -0.3843],
        [ 0.0341,  0.9601,  0.7766, -3.2461, -2.1290,  0.8250, -0.5179],
        [ 0.0000,  0.0000,  0.0000, -1.3966,  0.0000,  0.0000,  0.0000],
        [ 0.0000,  0.0000,  0.0000, -1.6852, -1.7331,  0.0000,  0.0000],
        [ 0.0000,  0.0000,  0.0000, -1.5557, -1.3991,  1.3843,  0.0000],
        [ 0.0000,  0.0000,  0.0000, -1.6423, -1.9130,  0.6253, -1.1858]],
       dtype=torch.float64, grad_fn=<SelectBackward0>)

\[V = \begin{bmatrix} S^{1/2} & 0 \\ \bar{K} & P^{1/2} \end{bmatrix}\]

# V from standard filter
V_stand = cat_2d([[S_C,   torch.zeros_like(K_bar.mT)],
                  [K_bar, P_C_stand]])
V_stand[0]
tensor([[ 2.2770,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000],
        [ 2.5904,  1.8631,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000],
        [ 3.2634,  2.2593,  1.3379,  0.0000,  0.0000,  0.0000,  0.0000],
        [ 0.7766,  0.5205,  0.6151,  0.8355,  0.0000,  0.0000,  0.0000],
        [ 1.8316,  0.8659,  0.9168, -0.1000,  0.9427,  0.0000,  0.0000],
        [ 2.0275,  0.9734,  0.2654, -0.0859,  0.2527,  1.0461,  0.0000],
        [ 2.2790,  0.9606,  0.6923, -0.4813,  0.4183,  0.2013,  1.0540]],
       dtype=torch.float64, grad_fn=<SelectBackward0>)
test_close(M @ M.mT, V_stand @ V_stand.mT)
V_sr = torch.linalg.qr(M.mT).R.mT
V_sr[0]
tensor([[-2.2770,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000],
        [-2.5904, -1.8631,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000],
        [-3.2634, -2.2593, -1.3379,  0.0000,  0.0000,  0.0000,  0.0000],
        [-0.7766, -0.5205, -0.6151,  0.8355,  0.0000,  0.0000,  0.0000],
        [-1.8316, -0.8659, -0.9168, -0.1000,  0.9427,  0.0000,  0.0000],
        [-2.0275, -0.9734, -0.2654, -0.0859,  0.2527, -1.0461,  0.0000],
        [-2.2790, -0.9606, -0.6923, -0.4813,  0.4183, -0.2013, -1.0540]],
       dtype=torch.float64, grad_fn=<SelectBackward0>)
test_close(V_sr @ V_sr.mT, V_stand @ V_stand.mT)
n_dim_obs = R_C.shape[-1]
P_C = V_sr[:, n_dim_obs:, n_dim_obs:]
P_C
tensor([[[ 0.8355,  0.0000,  0.0000,  0.0000],
         [-0.1000,  0.9427,  0.0000,  0.0000],
         [-0.0859,  0.2527, -1.0461,  0.0000],
         [-0.4813,  0.4183, -0.2013, -1.0540]],

        [[ 0.8355,  0.0000,  0.0000,  0.0000],
         [-0.1000,  0.9427,  0.0000,  0.0000],
         [-0.0859,  0.2527, -1.0461,  0.0000],
         [-0.4813,  0.4183, -0.2013, -1.0540]]], dtype=torch.float64,
       grad_fn=<SliceBackward0>)
is_sr(P_C, P_stand)
True

\(P_C\) computed with the QR decomposition is not the same from the cholesky decomposition, but that’s okay! They are all valid square roots of the posterior covariance

(P_C == P_C_stand).all()
tensor(False)
(H @ P_m_C).shape
torch.Size([2, 3, 4])

Covariance


source

tensor_info

 tensor_info (x)
P_C, S_C = _filter_update_cov_SR(kSR.H, kSR.R_C, P_m_C)
def fuzz_filter_update_cov_SR(n=10):
    errs = []
    for _ in range(n):
        kSR = KalmanFilterSR.init_random(5,10,4)
        H, R_C, P_m = (kSR.H, kSR.R_C, torch.cat([kSR.P0]*5))
        R = R_C @ R_C.mT
        P_m_C = torch.linalg.cholesky(P_m)
        P_C, _ = _filter_update_cov_SR(H, R_C, P_m_C)
        K = _filter_update_k_gain(H, R, P_m)
        P_stand = _filter_update_cov(H, K, P_m)
        errs.append((P_C @ P_C.mT -  P_stand).abs().max().item())
    return torch.tensor(errs)
err = fuzz_filter_update_cov_SR(100)
assert err.max() < torch.tensor(1e-10)
err.median(), err.max()
(tensor(2.8588e-15), tensor(1.0214e-14))

Kalman Gain

Don’t compute the inverse of the matrix, but use cholesky_solve to invert the matrix

S = kSR.H @ P_m_C @ P_m_C.mT @ kSR.H.mT + kSR.R
test_close(kSR.R, kSR.R_C @ kSR.R_C.mT)
test_close(S_C @ S_C.mT, S)
_filter_update_k_gain_SR(H, P_m_C, S_C)
tensor([[[-0.0014, -0.2782,  0.4598],
         [ 0.2389, -0.3662,  0.6852],
         [ 0.2854,  0.2819,  0.1984],
         [ 0.3866, -0.1119,  0.5174]],

        [[-0.0014, -0.2782,  0.4598],
         [ 0.2389, -0.3662,  0.6852],
         [ 0.2854,  0.2819,  0.1984],
         [ 0.3866, -0.1119,  0.5174]]], dtype=torch.float64,
       grad_fn=<TransposeBackward0>)
test_close(_filter_update_k_gain(H, R, P_m), _filter_update_k_gain_SR(H, P_m_C, S_C))
def fuzz_kalman_gain_SR(n=10):
    errs = []
    for _ in range(n):
        kSR = KalmanFilterSR.init_random(5,10,4)
        H, R_C, P_m = (kSR.H, kSR.R_C, torch.cat([kSR.P0]*5))
        R = R_C @ R_C.mT
        P_m_C = torch.linalg.cholesky(P_m)
        P_C, S_C = _filter_update_cov_SR(H, R_C, P_m_C)
        K_stand = _filter_update_k_gain(H, R, P_m)
        K = _filter_update_k_gain_SR(H, P_m_C, S_C)
        errs.append((K_stand - K).abs().max().item())
    return torch.tensor(errs)
err = fuzz_kalman_gain_SR(100)
assert err.max() < torch.tensor(1e-10)
err.median(), err.max()
(tensor(4.5797e-15), tensor(2.5202e-14))

Measurement update

m, P_C = _filter_update_SR(H, d, R_C, m_m, P_m_C, obs)
show_as_row(m, P_C)
m.shape, P_C.shape

m

tensor([[[   nan],
         [   nan],
         [   nan],
         [   nan]],

        [[0.1197],
         [0.3457],
         [0.5033],
         [0.6044]]], dtype=torch.float64, grad_fn=)

P_C

tensor([[[ 0.8355,  0.0000,  0.0000,  0.0000],
         [-0.1000,  0.9427,  0.0000,  0.0000],
         [-0.0859,  0.2527, -1.0461,  0.0000],
         [-0.4813,  0.4183, -0.2013, -1.0540]],

        [[ 0.8355,  0.0000,  0.0000,  0.0000],
         [-0.1000,  0.9427,  0.0000,  0.0000],
         [-0.0859,  0.2527, -1.0461,  0.0000],
         [-0.4813,  0.4183, -0.2013, -1.0540]]], dtype=torch.float64,
       grad_fn=)
(torch.Size([2, 4, 1]), torch.Size([2, 4, 4]))
get_test_data(1, 5,4, bs=1)[1].shape
torch.Size([1, 1, 5])
def fuzz_filter_update_SR(n=10):
    errs = {'mean': [], 'cov': []}
    for _ in range(n):
        kSR = KalmanFilterSR.init_random(5,10,4)
        H, d, R, R_C, m_m, P_m = (kSR.H, kSR.d, kSR.R, kSR.R_C, torch.cat([kSR.m0]*5), torch.cat([kSR.P0]*5))
        obs = torch.randn_like(H @ m_m)
        P_m_C = torch.linalg.cholesky(P_m)
        mSR, P_m_C = _filter_update_SR(H, d, R_C, m_m, P_m_C, obs)
        m, P_C = _filter_update(H, d, R, m_m, P_m, obs)
        errs['mean'].append((mSR - m).abs().max().item())
        errs['cov'].append((P_C - P_m_C @ P_m_C.mT).abs().max().item())
    return pd.DataFrame(errs)

err = fuzz_filter_update_SR(100)

err.median(), err.max()
(mean    1.049161e-14
 cov     2.740863e-15
 dtype: float64,
 mean    5.573320e-14
 cov     1.099121e-14
 dtype: float64)

Missing observations

Update mask

Here need to compute the square root of \(R\), because cannot apply the mask to \(R^{1/2}\)

R
tensor([[[1.5571, 0.3980, 0.0426],
         [0.3980, 1.7403, 1.2398],
         [0.0426, 1.2398, 1.5260]]], dtype=torch.float64,
       grad_fn=<UnsafeViewBackward0>)
is_posdef(R)
tensor([True])
R_m = torch.tensor([[1.5571,  0.0426], [0.0426, 1.5259]])
R_m
tensor([[1.5571, 0.0426],
        [0.0426, 1.5259]])
is_posdef(R_m)
tensor(True)
m = [True, False, True]
is_posdef(R[:, m,:][:, :, m])
tensor([True])
H_m, d_m, R_m, R_C_m, obs_m, = H[:, m,:], d[:, m,:], R[:, m,:][:, :,m], R_C[:, m,:][:, :,m], obs[:, m]
R2 = R_m
R2_C_m = torch.linalg.cholesky(R_m)
is_sr(R_C_m, R_m)
False
_filter_update_SR(H_m, d_m, R_C_m, m_m, P_m_C, obs_m)[0] - _filter_update(H_m, d_m, R_m, m_m, P_m_C @ P_m_C.mT, obs_m)[0]
tensor([[[    nan],
         [    nan],
         [    nan],
         [    nan]],

        [[-0.1152],
         [-0.1834],
         [-0.1472],
         [-0.1785]]], dtype=torch.float64, grad_fn=<SubBackward0>)
_filter_update_SR(H_m, d_m, R2_C_m, m_m, P_m_C, obs_m)[0] - _filter_update(H_m, d_m, R_m, m_m, P_m_C @ P_m_C.mT, obs_m)[0]
tensor([[[        nan],
         [        nan],
         [        nan],
         [        nan]],

        [[-4.4409e-16],
         [-4.4409e-16],
         [ 0.0000e+00],
         [-8.8818e-16]]], dtype=torch.float64, grad_fn=<SubBackward0>)
show_as_row(*_filter_update_mask_SR(H, d, R_C, m_m, P_m_C, obs, mask[0, 0, :] ))

#0

tensor([[[1.3313],
         [1.0104],
         [1.6996],
         [1.7246]],

        [[0.7732],
         [1.3875],
         [1.4670],
         [1.6642]]], dtype=torch.float64, grad_fn=)

#1

tensor([[[ 1.1441,  0.0000,  0.0000,  0.0000],
         [ 0.7349,  1.3175,  0.0000,  0.0000],
         [ 0.4355,  0.5900, -1.1768,  0.0000],
         [ 0.3596,  1.0472, -0.3473, -1.1330]],

        [[ 1.1441,  0.0000,  0.0000,  0.0000],
         [ 0.7349,  1.3175,  0.0000,  0.0000],
         [ 0.4355,  0.5900, -1.1768,  0.0000],
         [ 0.3596,  1.0472, -0.3473, -1.1330]]], dtype=torch.float64,
       grad_fn=)
m, P_C = _filter_update_mask_SR(H, d, R_C, m_m, P_m_C, obs, mask[0, 0, :] )
m.shape, P_C.shape
(torch.Size([2, 4, 1]), torch.Size([2, 4, 4]))
mask[0,0].shape
torch.Size([3])
def fuzz_filter_update_SR(n=10):
    errs = {'mean': [], 'cov': []}
    for _ in range(n):
        kSR = KalmanFilterSR.init_random(5,10,4)
        H, d, R, R_C, m_m, P_m = (kSR.H, kSR.d, kSR.R, kSR.R_C, torch.cat([kSR.m0]*5), torch.cat([kSR.P0]*5))
        obs, mask, _  = get_test_data(1, 5, 4, bs=5)
        obs, mask = obs[:,0].unsqueeze(-1), mask[0,0]
        
        P_m_C = torch.linalg.cholesky(P_m)
        mSR, P_m_C = _filter_update_mask_SR(H, d, R, m_m, P_m_C, obs, mask)
        m, P_C = _filter_update_mask(H, d, R, m_m, P_m, obs, mask)
        errs['mean'].append((mSR - m).abs().max().item())
        errs['cov'].append((P_C - P_m_C @ P_m_C.mT).abs().max().item())
    return pd.DataFrame(errs)

err = fuzz_filter_update_SR(100)

err.median(), err.max()
(mean    3.996803e-15
 cov     1.970646e-15
 dtype: float64,
 mean    3.996803e-15
 cov     7.827072e-15
 dtype: float64)
Update mask batch
m, P_C = _filter_update_mask_batch_SR(H, d, R, m_m, P_m_C, obs, mask[:,0,:] )
show_as_row(m, P_C)
m.shape, P_C.shape

m

tensor([[[1.3685],
         [1.0982],
         [1.7967],
         [1.8337]],

        [[0.1197],
         [0.3457],
         [0.5033],
         [0.6044]]], dtype=torch.float64, grad_fn=)

P_C

tensor([[[ 1.1607,  0.0000,  0.0000,  0.0000],
         [ 0.8022,  1.3585,  0.0000,  0.0000],
         [ 0.5153,  0.6769, -1.2081,  0.0000],
         [ 0.4512,  1.1387, -0.3915, -1.1430]],

        [[ 0.8355,  0.0000,  0.0000,  0.0000],
         [-0.1000,  0.9427,  0.0000,  0.0000],
         [-0.0859,  0.2527, -1.0461,  0.0000],
         [-0.4813,  0.4183, -0.2013, -1.0540]]], dtype=torch.float64,
       grad_fn=)
(torch.Size([2, 4, 1]), torch.Size([2, 4, 4]))
m.sum().backward(retain_graph=True) # check that pytorch can compute gradients with the whole batch and gradients aren't nan
H.grad
tensor([[[-5.3359, -5.2561, -7.0879, -7.6131],
         [ 0.0176, -0.2548, -0.2340, -0.2578],
         [-0.2513, -0.9494, -1.2821, -1.5227]]], dtype=torch.float64)

Filter All

The resursive version of the kalman filter is apperently breaking pytorch gradients calculations so a workaround is needed. During the loop the states are saved in a python list and then at the end they are combined back into a tensor. The last line of the function does:

  • convert lists to tensors
  • correct order dimensions
filt_stateSR, pred_stateSR  = kSR._filter_all(data, mask, control)
(ms, P_Cs), (m_ms, P_m_Cs) = filt_stateSR, pred_stateSR

Predictions at time 0 for both batches

show_as_row(*map(Self.shape(), (m_ms, P_m_Cs, ms, P_Cs,)))

#0

torch.Size([2, 10, 4, 1])

#1

torch.Size([2, 10, 4, 4])

#2

torch.Size([2, 10, 4, 1])

#3

torch.Size([2, 10, 4, 4])
show_as_row(*map(lambda x:x[0][0], (m_ms, P_m_Cs, ms, P_Cs,)))

#0

tensor([[0.4499],
        [0.8575],
        [0.2647],
        [0.4293]], dtype=torch.float64, grad_fn=)

#1

tensor([[1.2562, 0.0000, 0.0000, 0.0000],
        [0.3804, 0.9666, 0.0000, 0.0000],
        [0.9536, 0.7909, 1.2431, 0.0000],
        [0.4487, 0.4993, 0.4600, 1.2599]], dtype=torch.float64,
       grad_fn=)

#2

tensor([[0.5189],
        [0.9208],
        [0.4203],
        [0.5421]], dtype=torch.float64, grad_fn=)

#3

tensor([[-1.1638,  0.0000,  0.0000,  0.0000],
        [-0.2344, -0.9142,  0.0000,  0.0000],
        [-0.5963, -0.5742, -1.1218,  0.0000],
        [-0.1702, -0.3039, -0.2622,  1.2088]], dtype=torch.float64,
       grad_fn=)

Filter

The filter methods wraps _filter_all but in addition:

  • returns only filtered state
  • remove last dimensions from mean

source

KalmanFilterSR.filter

 KalmanFilterSR.filter (obs:torch.Tensor, mask:torch.Tensor,
                        control:torch.Tensor)

Filter observation

Type Details
obs Tensor ([n_batches], n_obs, [self.n_dim_obs]) where n_batches and n_dim_obs dimensions can be omitted if 1
mask Tensor ([n_batches], n_obs, [self.n_dim_obs]) where n_batches and n_dim_obs dimensions can be omitted if 1
control Tensor ([n_batches], n_obs, [self.n_dim_contr])
Returns ListMultiNormal Filtered state (n_batches, n_obs, self.n_dim_state)
filtSR = kSR.filter(data, mask, control)
filtSR.mean.shape, filtSR.cov.shape
(torch.Size([2, 10, 4, 1]), torch.Size([2, 10, 4, 4]))
def fuzz_filter_SR(n=10):
    errs = {'mean': [], 'cov': []}
    for _ in range(n):
        kSR = KalmanFilterSR.init_random(5,10,4)
        k = KalmanFilter.init_from(kSR)
        dat = get_test_data(20, 5, 4)
        mean, cov = k.filter(*dat)
        meanSR, covSR = kSR.filter(*dat)
        covSR = covSR @ covSR.mT
        errs['mean'].append((meanSR - mean).abs().max().item())
        errs['cov'].append((covSR - cov).abs().max().item())
    return pd.DataFrame(errs)

err = fuzz_filter_SR(10)

err.median(), err.max()
(mean    8.108847e-12
 cov     8.149592e-12
 dtype: float64,
 mean    4.446576e-11
 cov     1.261309e-10
 dtype: float64)

Smooth

Smooth update step

compute the probability of the state at time t given all the observations

\(p(x_t|Y) = \mathcal{N}(x_t; m_t^s, P_t^s)\) where:

  • Kalman smoothing gain: \(G_t = P_tA^T(P_{t+1}^-)^{-1}\)
  • smoothed mean: \(m_t^s = m_t + G_t(m_{t+1}^s - m_{t+1}^-)\)
  • smoothed covariance: \(P_t^s = P_t + G_t(P_{t+1}^s - P_{t+1}^-)G_t^T\)
filt_state = ListMNormal(filt_stateSR.mean, filt_stateSR.cov @ filt_stateSR.cov.mT)
pred_state = ListMNormal(pred_stateSR.mean, pred_stateSR.cov @ pred_stateSR.cov.mT)
K_p = _smooth_gain_SR(kSR.A, filt_stateSR[:, 0].cov, pred_stateSR[:, 0].cov)
test_close(
    _smooth_gain_SR(kSR.A, filt_stateSR[:, 0].cov, pred_stateSR[:, 0].cov),
    _smooth_gain(kSR.A, filt_state[:, 0].cov, pred_state[:, 0].cov)
)
test_close(filt_state[0,0].cov, filt_stateSR[0,0].cov @ filt_stateSR[0,0].cov.mT)
test_close(pred_state[0,0].cov, pred_stateSR[0,0].cov @ pred_stateSR[0,0].cov.mT)
def fuzz_gain_SR(n=10):
    errs = {'K': []}
    for _ in range(n):
        kSR = KalmanFilterSR.init_random(5,10,4)
        k = KalmanFilter.init_from(kSR)
        dat = get_test_data(2, 5, 4)
        (_, f_cov), (_, p_cov) = k._filter_all(*dat)
        (_, f_covSR), (_, p_covSR) = kSR._filter_all(*dat)
        K = _smooth_gain(k.A, f_cov[:, 0], p_cov[:, 0])
        K_SR = _smooth_gain_SR(k.A, f_covSR[:, 0], p_covSR[:, 0])
        errs['K'].append((K - K_SR).abs().max().item())
    return pd.DataFrame(errs)

err = fuzz_gain_SR(10)

err.median(), err.max()
(K    1.920686e-14
 dtype: float64,
 K    4.352074e-14
 dtype: float64)
m_p, P_p = _smooth_update_SR(kSR.A, filt_stateSR[:, 0, :], pred_stateSR[:, 0, :], filt_stateSR[:, 0, :])
test_close(filt_state[:, 0, :].mean, filt_stateSR[:, 0, :].mean)
_smooth_update(kSR.A, filt_state[0,0],  pred_state[0,0] , filt_state[0,0] )
MultiNormal(mean=tensor([[[0.6325],
         [0.9899],
         [0.5680],
         [0.6284]]], dtype=torch.float64, grad_fn=<AddBackward0>), cov=tensor([[[ 0.7480, -0.0959, -0.0944, -0.2622],
         [-0.0959,  0.6667,  0.1856,  0.0380],
         [-0.0944,  0.1856,  0.9192, -0.0280],
         [-0.2622,  0.0380, -0.0280,  1.3021]]], dtype=torch.float64,
       grad_fn=<DivBackward0>))
_smooth_update_SR(kSR.A, filt_stateSR[0,0],  pred_stateSR[0,0], filt_stateSR[0,0])
MultiNormal(mean=tensor([[[0.6325],
         [0.9899],
         [0.5680],
         [0.6284]]], dtype=torch.float64, grad_fn=<AddBackward0>), cov=tensor([[[-1.8502, -2.4971, -5.2813, -3.0804],
         [-2.4971, -1.6908, -4.8194, -2.8887],
         [-5.2813, -4.8194, -9.7933, -6.0797],
         [-3.0804, -2.8887, -6.0797, -2.6434]]], dtype=torch.float64,
       grad_fn=<DivBackward0>))
test_close((filt_stateSR.mean, pred_stateSR.mean), (filt_state.mean, pred_state.mean))
test_close(
    _smooth_update_SR(kSR.A, filt_stateSR[:, 0], pred_stateSR[:, 0], filt_state[:, 0]),
    _smooth_update(kSR.A, filt_state[:, 0], pred_state[:, 0], filt_state[:, 0])
)
def fuzz_update_SR(n=10):
    errs = {'mean': [], 'cov': []}
    for _ in range(n):
        kSR = KalmanFilterSR.init_random(5,10,4)
        k = KalmanFilter.init_from(kSR)
        dat = get_test_data(2, 5, 4)
        f_state, p_state = k._filter_all(*dat)
        f_stateSR, p_stateSR = kSR._filter_all(*dat)
        mean, cov = _smooth_update(k.A, f_state[:, 0], p_state[:, 0], f_state[:, 1])
        meanSR, covSR = _smooth_update_SR(k.A, f_stateSR[:, 0], p_stateSR[:, 0], f_state[:, 1])
        errs['mean'].append((meanSR - mean).abs().max().item())
        errs['cov'].append((covSR - cov).abs().max().item())
    return pd.DataFrame(errs)

err = fuzz_update_SR(10)

err.median(), err.max()
(mean    1.154632e-14
 cov     2.771117e-13
 dtype: float64,
 mean    3.375078e-14
 cov     8.242296e-13
 dtype: float64)

Smooth

smooth_state = _smooth_SR(kSR.A,  filt_stateSR, pred_stateSR)
is_sr(filt_stateSR.cov, filt_state.cov)
True
s_mean, s_cov =  _smooth(kSR.A,  filt_state, pred_state)
s_meanSR, s_covSR =  _smooth_SR(kSR.A,  filt_stateSR, pred_stateSR)
test_close(s_cov, s_covSR)
(s_cov - s_covSR).median(), (s_cov - s_covSR).max()
(tensor(0., dtype=torch.float64, grad_fn=<MedianBackward0>),
 tensor(3.5527e-15, dtype=torch.float64, grad_fn=<MaxBackward1>))
show_as_row(smooth_state.mean[0][0], smooth_state.cov[0][0])

#0

tensor([[-0.3279],
        [ 0.4029],
        [-0.7501],
        [-0.0098]], dtype=torch.float64, grad_fn=)

#1

tensor([[ 0.9163, -0.0046,  0.0978, -0.1494],
        [-0.0046,  0.7132,  0.2841,  0.0932],
        [ 0.0978,  0.2841,  1.1262,  0.0916],
        [-0.1494,  0.0932,  0.0916,  1.3623]], dtype=torch.float64,
       grad_fn=)
show_as_row(smooth_state.mean.shape, smooth_state.cov.shape)

#0

torch.Size([2, 10, 4, 1])

#1

torch.Size([2, 10, 4, 4])

KalmanFilter method

mask
tensor([[[ True, False, False],
         [ True, False,  True],
         [False,  True, False],
         [False, False, False],
         [False, False,  True],
         [ True,  True,  True],
         [ True,  True,  True],
         [ True,  True, False],
         [ True, False,  True],
         [ True,  True,  True]],

        [[ True,  True,  True],
         [ True,  True, False],
         [ True,  True,  True],
         [ True, False,  True],
         [False,  True,  True],
         [ True,  True,  True],
         [ True,  True,  True],
         [ True,  True,  True],
         [ True,  True,  True],
         [ True,  True,  True]]])
torch.argwhere((~mask).any(-1).any(0)).min()
tensor(0)
torch.ones(1,2,0)
tensor([], size=(1, 2, 0))

source

KalmanFilterSR.smooth

 KalmanFilterSR.smooth (obs:torch.Tensor, mask:torch.Tensor,
                        control:torch.Tensor)

Kalman Filter Smoothing

Type Details
obs Tensor
mask Tensor
control Tensor
Returns ListMultiNormal [n_timesteps, n_dim_state] smoothed state
smoothed_state = kSR.smooth(data, mask, control)
show_as_row(smoothed_state.mean.shape, smoothed_state.cov.shape)

#0

torch.Size([2, 10, 4, 1])

#1

torch.Size([2, 10, 4, 4])
smoothed_state.cov.isnan().any()
tensor(False)
smoothed_state.mean.sum().backward(retain_graph=True)
A.grad
tensor([[[0., 0., 0., 0.],
         [0., 0., 0., 0.],
         [0., 0., 0., 0.],
         [0., 0., 0., 0.]]], dtype=torch.float64)
smoothed_state_stand = k.smooth(data, mask, control)
(smoothed_state.mean - smoothed_state_stand.mean).max()
(smoothed_state.cov - smoothed_state_stand.cov).max()
tensor(0.6755, dtype=torch.float64, grad_fn=<MaxBackward1>)
def fuzz_smooth_SR(n=10):
    errs = {'mean': [], 'cov': []}
    for _ in range(n):
        kSR = KalmanFilterSR.init_random(5,10,4)
        k = KalmanFilter.init_from(kSR)
        dat = get_test_data(20, 5, 4)
        mean, cov = k.smooth(*dat)
        meanSR, covSR = kSR.smooth(*dat)
        errs['mean'].append((meanSR - mean).abs().max().item())
        errs['cov'].append((covSR - cov).abs().max().item())
    return pd.DataFrame(errs)

err = fuzz_smooth_SR(10)

err.median(), err.max()
(mean    9.733464e-12
 cov     8.838374e-12
 dtype: float64,
 mean    4.984400e-09
 cov     4.896766e-09
 dtype: float64)

Predict

predict can be vectorized across both the batch and the timesteps, except for timesteps that require conditional predictions

Obs from State

smoothed_state.mean.shape, smoothed_state.cov.shape
(torch.Size([2, 10, 4, 1]), torch.Size([2, 10, 4, 4]))
pred_obs0 = kSR._obs_from_state(smoothed_state)
pred_obs0.mean.shape, pred_obs0.cov.shape
(torch.Size([2, 10, 3, 1]), torch.Size([2, 10, 3, 3]))
pred_obs0.cov.isnan().any()
tensor(False)
gap_mask = ~mask.all(-1)
gap_mask.shape
torch.Size([2, 10])

Predict has various modes:

  • pred_only_gap is True, returns predictions only where the mask is False
    • use_conditional returns a list (for each batch) of list (for each time stamp) of Tensors of shape [1, gap_len]
    • use_conditional is False, returns a list (for each batch) of Tensor of shape [n_times_gap, n_dim_obs]

Masked Batch

gap_mask = ~mask.all(-1)
from pprint import pp
pp(_masked2batch(mask[gap_mask], mask)[0])
[tensor([False, False]),
 tensor([False]),
 tensor([False, False]),
 tensor([False, False, False]),
 tensor([False, False]),
 tensor([False]),
 tensor([False])]
str(_masked2batch(mask[gap_mask], mask)[0])
'[tensor([False, False]), tensor([False]), tensor([False, False]), tensor([False, False, False]), tensor([False, False]), tensor([False]), tensor([False])]'
show_as_row(all_mask = mask[0] , only_gap=_masked2batch(mask[gap_mask], mask)[0])

all_mask

tensor([[ True, False, False],
        [ True, False,  True],
        [False,  True, False],
        [False, False, False],
        [False, False,  True],
        [ True,  True,  True],
        [ True,  True,  True],
        [ True,  True, False],
        [ True, False,  True],
        [ True,  True,  True]])

only_gap

[tensor([False, False]),
 tensor([False]),
 tensor([False, False]),
 tensor([False, False, False]),
 tensor([False, False]),
 tensor([False]),
 tensor([False])]

Predict


source

KalmanFilterSR.predict

 KalmanFilterSR.predict (obs, mask, control, smooth=True)

Predicted observations at all times

Exploration


source

with_settings

 with_settings (k, **kwargs)

source

replacing_ctx

 replacing_ctx (*args)
with with_settings(kSR, use_conditional=False, pred_only_gap=False):
    pred_mean, pred_cov = kSR.predict(data, mask, control)
show_as_row(mean= pred_mean.shape, cov = pred_cov.shape)

mean

torch.Size([2, 10, 3])

cov

torch.Size([2, 10, 3, 3])
with with_settings(kSR, use_conditional=False, pred_only_gap=True):
    pred_mean, pred_cov = kSR.predict(data, mask, control)
show_as_row(mean= pred_mean[0], cov = pred_cov[0])

mean

[tensor([-0.3666,  0.3765], dtype=torch.float64, grad_fn=),
 tensor([-0.0844], dtype=torch.float64, grad_fn=),
 tensor([0.1625, 0.3119], dtype=torch.float64, grad_fn=),
 tensor([-0.1102, -0.5474, -0.1901], dtype=torch.float64,
       grad_fn=),
 tensor([-0.1439, -0.4996], dtype=torch.float64, grad_fn=),
 tensor([0.4966], dtype=torch.float64, grad_fn=),
 tensor([0.0202], dtype=torch.float64, grad_fn=)]

cov

[tensor([[2.5518, 2.1682],
        [2.1682, 2.8766]], dtype=torch.float64, grad_fn=),
 tensor([[2.3344]], dtype=torch.float64, grad_fn=),
 tensor([[1.9095, 0.5371],
        [0.5371, 3.0221]], dtype=torch.float64, grad_fn=),
 tensor([[2.0929, 1.1465, 0.8395],
        [1.1465, 2.8696, 2.5367],
        [0.8395, 2.5367, 3.5092]], dtype=torch.float64,
       grad_fn=),
 tensor([[1.9076, 0.8393],
        [0.8393, 2.3729]], dtype=torch.float64, grad_fn=),
 tensor([[2.7991]], dtype=torch.float64, grad_fn=),
 tensor([[2.2841]], dtype=torch.float64, grad_fn=)]
with with_settings(kSR, use_conditional=True, pred_only_gap=True):
    pred_mean, pred_cov = kSR.predict(data, mask, control)
show_as_row(mean= pred_mean[0], cov = pred_cov[0])

mean

[tensor([0.0292, 0.6235], dtype=torch.float64, grad_fn=),
 tensor([0.4156], dtype=torch.float64, grad_fn=),
 tensor([0.3672, 0.7999], dtype=torch.float64, grad_fn=),
 tensor([-0.1102, -0.5474, -0.1901], dtype=torch.float64,
       grad_fn=),
 tensor([-0.0686, -0.1575], dtype=torch.float64, grad_fn=),
 tensor([0.4823], dtype=torch.float64, grad_fn=),
 tensor([0.1720], dtype=torch.float64, grad_fn=)]

cov

[tensor([[2.5518, 2.1682],
        [2.1682, 2.8766]], dtype=torch.float64, grad_fn=),
 tensor([[2.3344]], dtype=torch.float64, grad_fn=),
 tensor([[1.9095, 0.5371],
        [0.5371, 3.0221]], dtype=torch.float64, grad_fn=),
 tensor([[2.0929, 1.1465, 0.8395],
        [1.1465, 2.8696, 2.5367],
        [0.8395, 2.5367, 3.5092]], dtype=torch.float64,
       grad_fn=),
 tensor([[1.9076, 0.8393],
        [0.8393, 2.3729]], dtype=torch.float64, grad_fn=),
 tensor([[2.7991]], dtype=torch.float64, grad_fn=),
 tensor([[2.2841]], dtype=torch.float64, grad_fn=)]
with with_settings(kSR, use_conditional = True, pred_only_gap = False):
    test_fail(kSR.predict, [data, mask, control]) # this params combination is invalid

Gap only prediction

copy paste from other notebook to make visualization easier


source

buffer_pred_single

 buffer_pred_single (preds:list[torch.Tensor], masks:torch.Tensor)

For predictions are for gaps only add buffer of Nan so they have same shape of targets


source

buffer_pred

 buffer_pred (preds:list[list[torch.Tensor]], masks:torch.Tensor)

For predictions are for gaps only add buffer of Nan so they have same shape of targets

with with_settings(kSR, use_conditional=False, pred_only_gap=True):
    pred_m_gap, _ = kSR.predict(data, mask, control)
    
with with_settings(kSR, use_conditional=False, pred_only_gap=False):
    pred_m, _ = kSR.predict(data, mask, control)

soooooooo this is a problem!!! those should be the same

show_as_row(gap = buffer_pred(pred_m_gap, mask), no_gap = pred_m)

gap

tensor([[[    nan, -0.3666,  0.3765],
         [    nan, -0.0844,     nan],
         [ 0.1625,     nan,  0.3119],
         [-0.1102, -0.5474, -0.1901],
         [-0.1439, -0.4996,     nan],
         [    nan,     nan,     nan],
         [    nan,     nan,     nan],
         [    nan,     nan,  0.4966],
         [    nan,  0.0202,     nan],
         [    nan,     nan,     nan]],

        [[    nan,     nan,     nan],
         [    nan,     nan,  0.2059],
         [    nan,     nan,     nan],
         [    nan, -0.3611,     nan],
         [ 0.1032,     nan,     nan],
         [    nan,     nan,     nan],
         [    nan,     nan,     nan],
         [    nan,     nan,     nan],
         [    nan,     nan,     nan],
         [    nan,     nan,     nan]]], dtype=torch.float64)

no_gap

tensor([[[ 0.0322, -0.3666,  0.3765],
         [ 0.1475, -0.0844,  0.4320],
         [ 0.1625, -0.0796,  0.3119],
         [-0.1102, -0.5474, -0.1901],
         [-0.1439, -0.4996,  0.0158],
         [ 0.1736, -0.0500,  0.1638],
         [ 0.0998, -0.2115,  0.2026],
         [ 0.2642,  0.0727,  0.4966],
         [ 0.2667,  0.0202,  0.5999],
         [ 0.5621,  0.4854,  1.0911]],

        [[ 0.0523, -0.3558,  0.1773],
         [ 0.0572, -0.2074,  0.2059],
         [ 0.0361, -0.2809,  0.0575],
         [-0.0671, -0.3611,  0.1202],
         [ 0.1032, -0.2193, -0.1928],
         [-0.1298, -0.5874, -0.3787],
         [ 0.0450, -0.2554, -0.2029],
         [ 0.0938, -0.0932,  0.3782],
         [ 0.2229, -0.1440, -0.1133],
         [ 0.3722,  0.3017,  1.1004]]], dtype=torch.float64,
       grad_fn=)
state = kSR.smooth(data, mask, control)
state.mean.shape
torch.Size([2, 10, 4, 1])
gap_mask = ~mask.all(-1)
# this destroy batches! so need to do some magic after
state_gap = state[gap_mask]
state_gap.mean.shape
torch.Size([10, 4, 1])
pred_obs = kSR._obs_from_state(state)
pred_obs.mean.squeeze_(-1)
tensor([[[ 0.0322, -0.3666,  0.3765],
         [ 0.1475, -0.0844,  0.4320],
         [ 0.1625, -0.0796,  0.3119],
         [-0.1102, -0.5474, -0.1901],
         [-0.1439, -0.4996,  0.0158],
         [ 0.1736, -0.0500,  0.1638],
         [ 0.0998, -0.2115,  0.2026],
         [ 0.2642,  0.0727,  0.4966],
         [ 0.2667,  0.0202,  0.5999],
         [ 0.5621,  0.4854,  1.0911]],

        [[ 0.0523, -0.3558,  0.1773],
         [ 0.0572, -0.2074,  0.2059],
         [ 0.0361, -0.2809,  0.0575],
         [-0.0671, -0.3611,  0.1202],
         [ 0.1032, -0.2193, -0.1928],
         [-0.1298, -0.5874, -0.3787],
         [ 0.0450, -0.2554, -0.2029],
         [ 0.0938, -0.0932,  0.3782],
         [ 0.2229, -0.1440, -0.1133],
         [ 0.3722,  0.3017,  1.1004]]], dtype=torch.float64,
       grad_fn=<SqueezeBackward3>)
pred_obs_gap = kSR._obs_from_state(state_gap)
pred_obs_gap.mean.squeeze_(-1)
tensor([[ 0.0322, -0.3666,  0.3765],
        [ 0.1475, -0.0844,  0.4320],
        [ 0.1625, -0.0796,  0.3119],
        [-0.1102, -0.5474, -0.1901],
        [-0.1439, -0.4996,  0.0158],
        [ 0.2642,  0.0727,  0.4966],
        [ 0.2667,  0.0202,  0.5999],
        [ 0.0572, -0.2074,  0.2059],
        [-0.0671, -0.3611,  0.1202],
        [ 0.1032, -0.2193, -0.1928]], dtype=torch.float64,
       grad_fn=<SqueezeBackward3>)
(pred_obs[gap_mask].mean == pred_obs_gap.mean).all() # so far good
tensor(True)
pred_obs.mean.shape
torch.Size([2, 10, 3])
mask.shape
torch.Size([2, 10, 3])
show_as_row(pred = buffer_pred(_masked2batch(pred_obs_gap.mean, mask), mask), mask = mask, all = pred_obs.mean)

pred

tensor([[[    nan, -0.3666,  0.3765],
         [    nan, -0.0844,     nan],
         [ 0.1625,     nan,  0.3119],
         [-0.1102, -0.5474, -0.1901],
         [-0.1439, -0.4996,     nan],
         [    nan,     nan,     nan],
         [    nan,     nan,     nan],
         [    nan,     nan,  0.4966],
         [    nan,  0.0202,     nan],
         [    nan,     nan,     nan]],

        [[    nan,     nan,     nan],
         [    nan,     nan,  0.2059],
         [    nan,     nan,     nan],
         [    nan, -0.3611,     nan],
         [ 0.1032,     nan,     nan],
         [    nan,     nan,     nan],
         [    nan,     nan,     nan],
         [    nan,     nan,     nan],
         [    nan,     nan,     nan],
         [    nan,     nan,     nan]]], dtype=torch.float64)

mask

tensor([[[ True, False, False],
         [ True, False,  True],
         [False,  True, False],
         [False, False, False],
         [False, False,  True],
         [ True,  True,  True],
         [ True,  True,  True],
         [ True,  True, False],
         [ True, False,  True],
         [ True,  True,  True]],

        [[ True,  True,  True],
         [ True,  True, False],
         [ True,  True,  True],
         [ True, False,  True],
         [False,  True,  True],
         [ True,  True,  True],
         [ True,  True,  True],
         [ True,  True,  True],
         [ True,  True,  True],
         [ True,  True,  True]]])

all

tensor([[[ 0.0322, -0.3666,  0.3765],
         [ 0.1475, -0.0844,  0.4320],
         [ 0.1625, -0.0796,  0.3119],
         [-0.1102, -0.5474, -0.1901],
         [-0.1439, -0.4996,  0.0158],
         [ 0.1736, -0.0500,  0.1638],
         [ 0.0998, -0.2115,  0.2026],
         [ 0.2642,  0.0727,  0.4966],
         [ 0.2667,  0.0202,  0.5999],
         [ 0.5621,  0.4854,  1.0911]],

        [[ 0.0523, -0.3558,  0.1773],
         [ 0.0572, -0.2074,  0.2059],
         [ 0.0361, -0.2809,  0.0575],
         [-0.0671, -0.3611,  0.1202],
         [ 0.1032, -0.2193, -0.1928],
         [-0.1298, -0.5874, -0.3787],
         [ 0.0450, -0.2554, -0.2029],
         [ 0.0938, -0.0932,  0.3782],
         [ 0.2229, -0.1440, -0.1133],
         [ 0.3722,  0.3017,  1.1004]]], dtype=torch.float64,
       grad_fn=)
pred_gap_buff = buffer_pred(_masked2batch(pred_obs_gap.mean, mask), mask)
mask_na = ~pred_gap_buff.isnan()
test_close(pred_gap_buff[mask_na], pred_obs.mean[mask_na])
with with_settings(kSR, use_conditional=False, pred_only_gap=True):
    pred_gap_buff = buffer_pred(kSR.predict(data, mask, control).mean, mask)
mask_na = ~pred_gap_buff.isnan()
with with_settings(kSR, use_conditional=False, pred_only_gap=False):
    pred_ng = kSR.predict(data, mask, control).mean
test_close(pred_gap_buff[mask_na], pred_ng[mask_na])
show_as_row(pred = pred_gap_buff, mask = mask, all = pred_ng)

pred

tensor([[[    nan, -0.3666,  0.3765],
         [    nan, -0.0844,     nan],
         [ 0.1625,     nan,  0.3119],
         [-0.1102, -0.5474, -0.1901],
         [-0.1439, -0.4996,     nan],
         [    nan,     nan,     nan],
         [    nan,     nan,     nan],
         [    nan,     nan,  0.4966],
         [    nan,  0.0202,     nan],
         [    nan,     nan,     nan]],

        [[    nan,     nan,     nan],
         [    nan,     nan,  0.2059],
         [    nan,     nan,     nan],
         [    nan, -0.3611,     nan],
         [ 0.1032,     nan,     nan],
         [    nan,     nan,     nan],
         [    nan,     nan,     nan],
         [    nan,     nan,     nan],
         [    nan,     nan,     nan],
         [    nan,     nan,     nan]]], dtype=torch.float64)

mask

tensor([[[ True, False, False],
         [ True, False,  True],
         [False,  True, False],
         [False, False, False],
         [False, False,  True],
         [ True,  True,  True],
         [ True,  True,  True],
         [ True,  True, False],
         [ True, False,  True],
         [ True,  True,  True]],

        [[ True,  True,  True],
         [ True,  True, False],
         [ True,  True,  True],
         [ True, False,  True],
         [False,  True,  True],
         [ True,  True,  True],
         [ True,  True,  True],
         [ True,  True,  True],
         [ True,  True,  True],
         [ True,  True,  True]]])

all

tensor([[[ 0.0322, -0.3666,  0.3765],
         [ 0.1475, -0.0844,  0.4320],
         [ 0.1625, -0.0796,  0.3119],
         [-0.1102, -0.5474, -0.1901],
         [-0.1439, -0.4996,  0.0158],
         [ 0.1736, -0.0500,  0.1638],
         [ 0.0998, -0.2115,  0.2026],
         [ 0.2642,  0.0727,  0.4966],
         [ 0.2667,  0.0202,  0.5999],
         [ 0.5621,  0.4854,  1.0911]],

        [[ 0.0523, -0.3558,  0.1773],
         [ 0.0572, -0.2074,  0.2059],
         [ 0.0361, -0.2809,  0.0575],
         [-0.0671, -0.3611,  0.1202],
         [ 0.1032, -0.2193, -0.1928],
         [-0.1298, -0.5874, -0.3787],
         [ 0.0450, -0.2554, -0.2029],
         [ 0.0938, -0.0932,  0.3782],
         [ 0.2229, -0.1440, -0.1133],
         [ 0.3722,  0.3017,  1.1004]]], dtype=torch.float64,
       grad_fn=)

Conditional

with with_settings(kSR, use_conditional=False, pred_only_gap=True):
    pred_gap = buffer_pred(kSR.predict(data, mask, control).mean, mask)
with with_settings(kSR, use_conditional=True, pred_only_gap=True):
    pred_gap_cond = buffer_pred(kSR.predict(data, mask, control).mean, mask)
show_as_row(no_conditional = pred_gap, conditional = pred_gap_cond)

no_conditional

tensor([[[    nan, -0.3666,  0.3765],
         [    nan, -0.0844,     nan],
         [ 0.1625,     nan,  0.3119],
         [-0.1102, -0.5474, -0.1901],
         [-0.1439, -0.4996,     nan],
         [    nan,     nan,     nan],
         [    nan,     nan,     nan],
         [    nan,     nan,  0.4966],
         [    nan,  0.0202,     nan],
         [    nan,     nan,     nan]],

        [[    nan,     nan,     nan],
         [    nan,     nan,  0.2059],
         [    nan,     nan,     nan],
         [    nan, -0.3611,     nan],
         [ 0.1032,     nan,     nan],
         [    nan,     nan,     nan],
         [    nan,     nan,     nan],
         [    nan,     nan,     nan],
         [    nan,     nan,     nan],
         [    nan,     nan,     nan]]], dtype=torch.float64)

conditional

tensor([[[    nan,  0.0292,  0.6235],
         [    nan,  0.4156,     nan],
         [ 0.3672,     nan,  0.7999],
         [-0.1102, -0.5474, -0.1901],
         [-0.0686, -0.1575,     nan],
         [    nan,     nan,     nan],
         [    nan,     nan,     nan],
         [    nan,     nan,  0.4823],
         [    nan,  0.1720,     nan],
         [    nan,     nan,     nan]],

        [[    nan,     nan,     nan],
         [    nan,     nan,  0.5191],
         [    nan,     nan,     nan],
         [    nan,  0.2014,     nan],
         [ 0.6513,     nan,     nan],
         [    nan,     nan,     nan],
         [    nan,     nan,     nan],
         [    nan,     nan,     nan],
         [    nan,     nan,     nan],
         [    nan,     nan,     nan]]], dtype=torch.float64)
mask[gap_mask]
tensor([[ True, False, False],
        [ True, False,  True],
        [False,  True, False],
        [False, False, False],
        [False, False,  True],
        [ True,  True, False],
        [ True, False,  True],
        [ True,  True, False],
        [ True, False,  True],
        [False,  True,  True]])
data[gap_mask]
tensor([[0.8775,    nan,    nan],
        [0.6706,    nan, 0.9272],
        [   nan, 0.4967,    nan],
        [   nan,    nan,    nan],
        [   nan,    nan, 0.4760],
        [0.9991, 0.1775,    nan],
        [0.6734,    nan, 0.6468],
        [0.3725, 0.2052,    nan],
        [0.5927,    nan, 0.6441],
        [   nan, 0.9132, 0.0329]], dtype=torch.float64)
data[gap_mask]
tensor([[0.8775,    nan,    nan],
        [0.6706,    nan, 0.9272],
        [   nan, 0.4967,    nan],
        [   nan,    nan,    nan],
        [   nan,    nan, 0.4760],
        [0.9991, 0.1775,    nan],
        [0.6734,    nan, 0.6468],
        [0.3725, 0.2052,    nan],
        [0.5927,    nan, 0.6441],
        [   nan, 0.9132, 0.0329]], dtype=torch.float64)
assert is_posdef(pred_cov[0][0]).all()
state = kSR.smooth(data, mask, control)
assert not state.cov.isnan().any()
kSR.use_sr_pred = False
pred_obs = kSR._obs_from_state(state)
assert not pred_obs.cov.isnan().any()
cov2std(pred_obs.cov).isnan().any()
tensor(False)
assert not kSR.predict(data, mask, control).cov.isnan().any()
assert not kSR.predict(data, mask, control, smooth=False).cov.isnan().any()
kSR._predict_filter(data, mask, control).std.shape
torch.Size([2, 10, 3, 3])
kSR.use_conditional = False
pred = kSR.predict(data, mask, control, smooth=True)
pred.mean.shape, pred.std.shape
AttributeError: 'ListMultiNormal' object has no attribute 'std'
filt_state, pred_state = kSR._filter_all(data, mask, control)
mean, cov = kSR._obs_from_state(pred_state)
is_posdef(cov @ cov.mT)
pred

Debug nan

import polars as pl
import altair as alt
kSR.predict(*get_test_data(200)).cov.isnan().any()
data.shape
reset_seed()

SR Filter

nan = [{'nan':kSR.predict(*get_test_data(n), smooth=smooth).cov.isnan().any().item(), 'n': n, 'rep': rep, 'smooth': smooth}
       for n in [10, 20, 30, 40, 50, 55, 60, 70, 100, 150, 200] for rep in range(10) for smooth in [True, False]]
nan_df = pl.DataFrame(nan).groupby(['nan', 'n', 'smooth']).count().to_pandas()
alt.Chart(nan_df).mark_line().encode(x='n:Q', y='count', color='nan', column='smooth')

Standard Filter

k = KalmanFilter.init_from(kSR)
nan = [{'nan':k.predict(*get_test_data(n), smooth=smooth).std.isnan().any().item(), 'n': n, 'rep': rep, 'smooth': smooth}
       for n in [10, 20, 30, 40, 50, 55, 60, 70, 100, 150, 200] for rep in range(10) for smooth in [True, False]]
nan_df = pl.DataFrame(nan).groupby(['nan', 'n', 'smooth']).count().to_pandas()
alt.Chart(nan_df).mark_line().encode(x='n:Q', y='count', color='nan', column='smooth')

so the standard filter is working better than the SR for the smoothing (with this parameter setting), so there is a way to make the sr smoother not that bad

But I just want to see how we have nan

So the problem is that we have a negative number on the diagonal … so is not positive definite and even the standard deviation is nan

for i in range(20):
    dat = get_test_data(10)
    pred = kSR.predict(*dat)
    if pred.cov.isnan().any():
        print(i)
        break
k = KalmanFilter.init_from(kSR)
for p1, p2 in zip(k.parameters(), kSR.parameters()):
    test_close(p1,p2)
f_state_stand = k.filter(*dat)
s_state_stand = k.smooth(*dat)
pred_stand = k.predict(*dat)
pred_stand[1, -1].std
(f_state_stand.mean - filt_state.mean).mean()
filt_state = kSR.filter(*dat)
s_state = kSR.smooth(*dat)
is_posdef(s_state.cov)
is_posdef(s_state[1, -1].cov)
filt_state[1,-1].cov
kSR.H @ s_state[1, -1].cov @ kSR.H.mT
kSR.use_sr_pred = False
pred_obs = kSR._obs_from_state(s_state)
pred_obs[1,-1].cov
kSR.predict(*dat).
pred.std[1,-1]

Additional

Constructors

Simple parameters


init_simple’]

Built-in mutable sequence.

If no argument is given, the constructor creates a new empty list. The argument must be an iterable if specified.

KalmanFilter.init_simple(2).state_dict()

Local slope

Local slope models are an extentions of local level model that in the state variable keep track of also the slope

Given \(n\) as the number of dimensions of the observations

The transition matrix (A) is:

\[A = \left[\begin{array}{cc}I & I \\ 0 & I\end{array}\right]\]

where:

  • \(I \in \mathbb{R}^{n \times n}\)
  • \(A \in \mathbb{R}^{2n \times 2n}\)

the state \(x \in \mathbb{R}^{2N \times 1}\) where the upper half keep track of the level and the lower half of the slope. \(A \in \mathbb{R}^2N \times 2N\)

the observation matrix (H) is:

\[H = \left[\begin{array}{cc}I & 0 \end{array}\right]\]

For the multivariate case the 1 are replaced with an identiy matrix

assuming that the control has the same dimensions of the observations then if we are doing a local slope model we have \(B \in \mathbb{R}^{state \times contr}\): \[ B = \begin{bmatrix} -I & I \\ 0 & 0 \end{bmatrix}\]


init_local_slope_pca’]

Built-in mutable sequence.

If no argument is given, the constructor creates a new empty list. The argument must be an iterable if specified.

KalmanFilter.init_local_slope_pca(2,2,pd.DataFrame([[1,2], [2,4]])).state_dict()

Export